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THERAPEUTIC AND DIAGNOSTIC AGENTS CAPABLE OF MODULATING CELLULAR RESPONSIVENESS TO 
CYTOKINES 

FIELD OF THE INVENTION 

5 The present invention relates generally to therapeutic and diagnostic agents. More particularly, 
the present invention provides therapeutic molecules capable of modulating signal transduction 
such as but not limited to cytokine-mediated signal transduction. The molecules of the present 
invention are useful, therefore, in modulating cellular responsiveness to cytokines as well as other 
mediators of signal transduction such as endogenous or exogenous molecules, antigens, microbes 
10 and microbial products, viruses or components thereof, ions, hormones and parasites. 

Bibliographic details of the publications referred to in this specification by author are collected 
at the end of the description. Sequence Identity Numbers (SEQ ID NOs.) for the nucleotide and 
amino acid sequences referred to in the specification are defined after the bibliography. A 
1 5 summary of the SEQ ID NOs is given in Table 1 . 

Throughout this specification and the claims which follow, unless the context requires otherwise, 
the word "comprise", or variations such as "comprises" or "comprising", will be understood to 
imply the inclusion of a stated integer or group of integers but not the exclusion of any other 
20 integer or group of integers. 

BACKGROUND OF THE INVENTION 

Cells continually monitor their environment in order to modulate physiological and biochemical 
25 processes which in tum affects future behaviour. Frequently, a cell's initial interaction with its 
surroundings occurs via receptors expressed on the plasma membrane. Activation of these 
receptors, whether through binding endogenous ligands (such as cytokines) or exogenous ligands 
(such as antigens), triggers a biochemical cascade from the membrane through the cytoplasm to 
the nucleus. 

30 

Of the endogenous Mgands, cytokines represent a particularly important and versatile group. 
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Cytokines are proteins which regulate the survival, proliferation, differentiation and function of 
a variety of cells within the body [Nicola, 1994]. The haemopoietic cytokines have in common 
a four-alpha heUcal bundle structure and the vast majority interact with a structurally related 
family of cell surface receptors, the type I and type II cytokine receptors [Bazan, 1990; Sprang, 
5 1993]. In all cases, ligand-induced receptor aggregation appears to be a critical event in initiating 
intracellular signal transduction cascades. Some cytokines, for example growth hormone, 
erythropoietin (Epo) and granulocyte-colony-stimulating factor (G-CSF), trigger receptor 
homodimerisation, while for other cytokines, receptor heterodimerisation or heterotrimerisation 
is crucial. In the latter cases, several cytokines share common receptor subunits and on this basis 

10 can be grouped into three subfamilies with similar patterns of intracellular activation and similar 
biological effects [Hilton, 1994]. Interleukin-3 (IL-3), IL-5 and granulocyte-macrophage colony- 
stimulating factor (GM-CSF) use the common P-receptor subunit (pc) and each cytokine 
stimulates the production and functional activity of granulocytes and macrophages. IL-2, IL-4, 
IL-7, IL-9, and EL- 15 each use the common y-chain (yc), while IL-4 and IL-13 share an 

15 altemative y-chain (y^c or IL-13 receptor C£-chain). Each of these cytokines plays an important 
role in regulating acquired immunity in the lymphoid system. Finally, IL-6, IL-1 1, leukaemia 
inhibitory factor (LW), oncostatin-M (OSM), ciliary neurotrophic factor (CNTF) and 
cardiotrophin (CT) share the receptor subunit gpl30. Each of these cytokines appears to be 
highly pleiotropic, having effects both within and outside the haemopoietic system [Nicola, 

20 1994]. 

In all of the above cases at least one subunit of each receptor complex contains the conserved 
sequence elements, termed boxl and box2, in their cytoplasmic tails [Murakami. 1991]. Boxl 
is a proline-rich motif which is located more proximal to the transmembrane domain than the - 

25 acidic box 2 element. The box-1 region serves as the binding site for a class of cytoplasmic 
tyrosine kinases termed JAKs (Janus kinases), Ligand-induced receptor dimerisation serves to 
increase the catalytic activity of the associated JAKs through cross-phosphorylation. Activated 
JAKs then tyrosine phosphorylate several substrates, including the receptors themselves. 
Specific phosphotyrosine residues on the receptor then serve as docking sites for SH2-containing 

30 proteins, the best characterised of which are the signal transducers and activators of transcription 
(STATs) and the adaptor protein, she. The STATs are then phosphorylated on tyrosines. 
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probably by JAKs, dissociate from the receptor and form either homodimers or heterodimers 
through the interaction of the SIC domain of one STAT with the phosphotyrosine residue of the 
other. STAT dimers then translocate to the nucleus where they bind to specific cytokine- 
responsive promoters and activate transcription [Darnell, 1994; Dile, 1995; Ihle, 1995]. In a 
5 separate pathway, tyrosine phosphorylated she interacts with another SH2 domain-containing 
protein, Grb-2, leading ultimately to activation of members of the MAP kinase family and in turn 
transcription factors such as fos and jun [Sato, 1993; Cutler, 1993]. These pathways are not 
unique to members of the cytokine receptor family since cytokines that bind receptor tyrosine 
kinases also being able to activate STATs and members of the MAP kinase family [David, 1996; 
10 Leaman, 1996; Shual, 1993; Sato, 1993; Cutler, 1993]. 

Four members of the JAK family of cytoplasmic tyrosine kinases have been described, JAKl, 
JAK2, JAK3 and TYK2, each of which binds to a specific subset of cytokine receptor subunits. 
Six STATs have been described (STATl through STAT6), and these too are activated by 

15 distinct cytokine/receptor complexes. For example, STATl appears to be functionally specific 
to the interferon system, STAT4 appears to be specific to IL-12, while STAT6 appears to be 
specific for IL-4 and IL-13. Thus, despite common activation mechanisms some degree of 
cytokine specificity may be achieved through the use of specific JAKs and STATs [Thierfelder, 

. 1996; Kaplan, 1996; Takeda, 1996; Shimoda, 1996; Meraz, 1996; Durbin, 1996], 

20 

In addition to those described above, there are clearly other mechanisms of activation of these 
pathways. For example, the JAK/STAT pathway appears to be able to activate MAP kinases 
independent of the she-induced pathway [David, 1995] and the STATs themselves can be 
activated without binding to the receptor, possibly by direct interaction with JAKs [Gupta, 
25 1996], Conversely, M activation of STATS may require the action of MAP kinase in addition 
to that of JAKs [David, 1995; Wen, 1995]. 

While the activation of these signalling pathways is becoming better understood, little is known 
of the regulation of these pathways, including employment of negative or positive feedback 
30 loops. This is important since once a cell has begun to respond to a stimulus, it is critical that 
the intensity and duration of the response is regulated and that signal transduction is switched 
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ofF. It is likewise desirable to increase the intensity of a response systemically or even locally as 
the situation requires. 

In work leading up to the present invention, the inventors sought to isolate negative regulators 
5 of signal transduction. The inventors have now identified a new family of proteins which are 
capable of acting as regulators of signalling. The new family of proteins is defined as the 
suppressor of cytokine signalling (SOCS) family based on the ability of the initially identified 
SOCS molecules to suppress cytokine-mediated signalling. It should be noted, however, that 
not all members of the SOCS family need necessarily share suppressor function nor target solely 

10 cytokine mediated signalUng. The SOCS family comprises at least three classes of protein 
molecules based on amino acid sequence motifs located N-terminal of a C-terminal motif called 
the SOCS box. The identification of this new family of regulatory molecules permits the 
generation of a range of effector or modulator molecules capable of modulating signal 
transduction and, hence, cellular responsiveness to a range of molecules including cytokines. 

15 The present invention, therefore, provides therapeutic and diagnostic agents based on SOCS 
proteins, derivatives, homologues, analogues and mimetics thereof as well as agonists and 
antagonists of SOCS proteins. 

SUMMARY OF THE INVENTION 

20 

The present invention provides inter alia nucleic acid molecules encoding members of the SOCS 
family of proteins as well as the proteins themselves. Reference hereinafter to "SOCS" 
encompasses any or all members of the SOCS family. Specific SOCS molecules are defined 
numerically such as, for example, SOCSl, S0CS2 and SOCS3. The species from which the 

25 SOCS has been obtained may be indicated by a preface of a single letter abbreviation where "h" 
is human, "m" is murine and "r" is rat. Accordingly, "mSOCS 1 "is a specific SOCS from a murine 
animal. Reference herein to "SOCS" is not to imply that the protein solely suppresses 
cytokine-mediated signal transduction, as the molecule may modulate other effector-mediated 
signal transductions such as by hormones or other endogenous or exogenous molecules, 

30 antigens, microbes and microbial products, viruses or components tiiereof, ions, hormones and 
parasites. The term "modulates" encompasses up-regulation, down-regulation as well as 
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maintenance of particular levels. 

One aspect of the present invention provides a nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding a protein or a derivative, 
5 homologue, analogue or mimetic thereof or a nucleotide sequence capable of hybridizing thereto 
under low stringency conditions at 42°C wherein said protein comprises a SOCS box in its C- 
terminal region 

Another aspect of the present invention provides a nucleic acid molecule comprising a sequence 
10 of nucleotides encoding or complementary to a sequence encoding a protein or a derivative, 
homologue, analogue or mimetic thereof or a nucleotide sequence capable of hybridizing thereto 
under low stringency conditions at 42X wherein said protein comprises a SOCS box in its C- 
terminal region and a proteinrmolecule interacting region. 

15 Yet another aspect of the present invention is directed to a nucleic acid molecule comprising a 
sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
derivative, homologue, analogue or mimetic thereof or a nucleotide sequence capable of 
hybridizing thereto under low stringency conditions at 42°C wherein said protein comprises a C- 
. terminal region and a proteinrmolecule interacting region located in a region N-terminal of the 

20 SOCS box. 

Preferably, the proteinrmolecule interacting region is a protein:DNA or proteinrprotein binding 
region. 

25 Still a further aspect of the present invention provides a nucleic acid molecule comprising a 
sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
derivative, homologue, analogue or mimetic thereof or a nucleotide sequence capable of 
hybridizing thereto under low stringency conditions at 42*'C wherein said protein comprises a 
SOCS box in its C-terminal region and one or more of an SH2 domain, WD-40 repeats or 

30 ankyrin repeats N-terminal of the SOCS box. 
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Even still a further aspect of the present invention is directed to a nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
protein or a derivative, homologue, analogue or mimetic thereof or a nucleotide sequence 
capable of hybridizing thereto under low stringency conditions at 42°C wherein said protein 
5 comprises a SOCS box in its C-terminal region wherein the SOCS box comprises the amino acid 
sequence: 

Xi X2 X3 X4 X5 Xg X7 Xg X9 Xio Xii X12 Xi3 X14X15 Xjg [XJ„ Xi7 X,8 X,9 X20 

^21 ^22 ^23 [^j]n ^24 ^25 ^26 ^27^28 

10 

wherein: Xj is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P,TorS; 

X4 is L. I, V, M, A or P; 
15 X5 is any amino acid; 

X^ is any amino acid; 

X7 is L, I, V,M, A, F, YorW; 

Xg is QTorS; 

X9 is R, K or H; 
20 X,Q is any amino acid; 

Xj, is any amino acid; 

Xj2 is L. I, V, M, A or P; 

Xi3 is any amino acid; 

X,4 is any amino acid; 
25 Xi5 is any amino acid; 

X,6 is L, I, V, M, A. P, G, C, T or S; 

[XJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
30 Xi^isL,!, V,M, AorP; 

Xi8 is any amino acid; 
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is any amino acid; 
X20L, I, V, M, AorP; 
X2, is P; 

X22 is L, I, V, M, A, P or G; 
5 X23 is P or N; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X24 is L, I, V, M, A or P; 
10 X25 is any annino acid; 

X26 is any amino acid; 

X27 is Y or F; 

X28 is L, I, V, M, A or P; 

15 and a proteinrmolecule interacting region such as but not limited to one or more of an SH2 
domain, WD-40 repeats and/or ankyrin repeats N-terminal of the SOCS box. 

Another aspect of the present invention is directed to a nucleic acid molecule comprising a 
. sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
20 derivative, homologue, analogue or mimetic thereof or a nucleotide sequence capable of 
hybridizing thereto under low stringency conditions at 42°C wherein said protein exhibits the 
following characteristics: 

(i) comprises a SOCS box in its C-terminal region having the amino acid sequence: 
25 X, X2 X3 X4 X5 Xg X7 Xg X9 X,o Xi, X,2 X|3 X]4Xi5 Xjg [XJ^ X,7 Xjg X^g X20 

X21 X22 X23 [Xj]„ X24 X25 X26 X27X2g 

wherein: X, is L, I, V, M, A or P; 

X2 is any amino acid residue; 
30 X3 is P, T or S; 

X4 is L, I, V, M, A or P; 
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X5 is any amino acid; 

is any amino acid; 
Xj is L, I, V, M, A, F, YorW; 
Xg is C, T or S; 
X9 is R, K or H; 
X|o is any amino acid; 
X,i is any amino acid; 
X12 is L, I, V, M, A or P; 
X,3 is any amino acid; 
X,4 is any amino acid; 
Xj5 is any amino acid; 
X16 is L, I, V, M, A, P, G, C, T or S; 

[XJ„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

Xi^is L, I, V, M, Aor P; 

X18 is any amino acid; 

Xi9 is any amino acid; 

X20L, I, V, M,Aor P; 

X2,isP; 

X22 is L, I, V, M. A,PorG; 
X23 is P or N; 

PCJ„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24isLJ. V,M. AorP; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; 

X2g is L, I, V, M, A or P; and 
comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
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wo 98/20023 



PCT/AU97/00729 



-9- 



protein:molecule interacting domain in a region N-terminal of the SOCS box. 

Preferably, the SOCS molecules modulate signal transduction such as from a cytokine or 
hormone or other endogenous or exogenous molecule, a microbe or microbial product, an 
5 antigen or a parasite. 

More preferably, the SOCS molecule modulate cytokine mediated signal transduction. 

Still another aspect of the present invention comprises a nucleic acid molecule comprising a 
10 sequence of nucleotides encoding or complementary to a sequence encoding a protein or a 
derivative, homologue, analogue or mimetic thereof or comprises a nucleotide sequence capable 
of hybridizing thereto under low stringency conditions at AT'C wherein said protein exhibits the 
following characteristics; 

(i) is capable of modulating signal transduction; 
1 5 (ii) comprises a SOCS box in its C-terminal region having the amino acid sequence: 



X| X2 X3 X4 X5 Xg X7 Xg X9 Xjo Xii X12 Xi3 X,4X,5 X,6 [XJn X17 X|8 X,9 X; 
X21 X22 X23 [XjJn X24 X25 X26 X27X28 



^20 



20 



wherein: 



Xi is L, I, V, M, Aor P; 

X2 is any amino acid residue; 

X3 is P, T or S; 

X4 is L, I, V, M, A or P; 

X5 is any amino acid; 

Xg is any amino acid; 

Xj is L, I, V, M, A, F, YorW; 

Xg is C,TorS; 

X9 is R, Kor H; 

Xjo is any amino acid; 

Xii is any annino acid; 

X,2 is L, I, V, M, Aor P; 



25 



30 
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is any amino acid; 
X|4 is any amino acid; 

is any amino acid; 

is L, I, V, M, A, P, G, C, T or S; 
5 [XJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
Xjyis L, I, V, M, Aor P; 
X,g is any amino acid; 
10 X,9 is any amino acid; 

X20L, I, V, M. Aor P; 
X2, is P; 

X22 is L, I, V, M, A,PorG; 
X23 is P or N; 

15 [Xj]n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V, M, A or P; 

X25 is any amino acid; 
20 X26 is any amino acid; 

X27 is Y or F; 

X28 is L, I, V, M, A or P; and 

(iii) comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
25 protein:molecule interacting domain in a region N-terminal of the SOCS box. 

Preferably, the signal transduction is mediated by a cytokine such as one or more of EPO, TPO, 
G-CSF, GM-CSF, IL-3. IL-2, IL-4, IL-7, IL-13, IL-6, LIF, IL-12, IFNa, TNFa, IL-1 and/or 
M-CSF. 

30 

Preferably, the signal transduction is mediated by one or more of Interleukin 6 (IL-6), Leukaemia 
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Inhibitory Factor (LIF), Oncostatin M (OSM), Interferon (IP^)-a and/or thrombopoietin. 

Preferably, the signal transduction is mediated by IL-6. 

5 Particularly preferred nucleic acid molecules comprise nucleotide sequences substantially set 
forth in SEQ ID NO:3 (mSOCSl), SEQ ID N0:5 (mSOCS2), SEQ ID NO:7 (mS0CS3), SEQ 
ID N0:9 (hSOCSl), SEQ ID N0:1 1 (rSOCSl), SEQ ID NO: 13 (mSOCS4), SEQ ID NO: 15 
and SEQ ID NO: 16 (hS0CS4), SEQ ID NO: 17 (mSOCSS), SEQ ID NO: 19 (hS0CS5), SEQ 
ID NO:20 (mSOCS6), SEQ ID NO:22 and SEQ ID NO:23 (hSOCS6), SEQ ID NO:24 

10 (mSOCST), SEQ ID NO:26 and SEQ ID NO:27 (hS0CS7), SEQ ID NO:28 (mSOCSS), SEQ 
ID NO:30 (mS0CS9), SEQ ID N0:31 (hS0CS9), SEQ ID NO:32 (mSOCSlO), SEQ ID NO:33 
and SEQ ID NO:34 (hSOCSlO), SEQ ID NO:35 (hSOCSl 1), SEQ ID NO:37 (mSOCS12), 
SEQ ID NO:38 and SEQ ID NO:39 (hSOCS12), SEQ ID NO:40 (mS0CS13), SEQ ID NO:42 
(hS0CS13), SEQ ID NO: 43 (mSOCSU). SEQ ID NO:45 (mSOCS15) and SEQ ID NO:47 

15 (hSOCSl 5) or a nucleotide sequence having at least about 15% siniilarity to all or a region of 
any of the listed sequences or a nucleotide acid molecule capable of hybridizing to any one of the 
listed sequences under low stringency conditions at 42°C. 

.Another aspect of the present invention relates to a protein or a derivative, homologue, analogue 
20 or mimetic thereof comprising a SOCS box in its C-terminal region. 

Yet another aspect of the present invention is directed to a protein or a derivative, homologue, 
analogue or mimetic thereof comprising a SOCS box in its C-terminal region and a 
protein:molecule interacting region. 

25 

Even yet another aspect of the present invention provides a protein or a derivative, homologue, 
analogue or mimetic tiiereof comprising an interacting region located in a region N-terminal of 
the SOCS box. 

30 Preferably, the protein:molecule interacting region is a protein:DNA or a protein:protein binding 
region. 
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Another aspect of the present invention contemplates a protein or a derivative, homologue, 
analogue or mimetic thereof comprising a SOCS box in its C-terminal region and a SH2 domain, 
WD-40 repeats or ankyrin repeats N-terminal of the SOCS box. 

5 Still yet another aspect of the present invention provides a protein or a derivative, homologue, 
analogue or mimetic thereof exhibiting the following characteristics: 

(i) comprises a SOCS box in its C-terminal region having the amino acid sequence: 

10 X, X2 X3 X4 X5 X5 X7 Xg X9 Xio X,j X12 Xi3 Xj4Xj5 Xj^ [XJn Xi7 X,8 Xjg X20 

^21 X23 [Xj]„ X24 X25 X25 X27X2g 

wherein: Xj is L, I, V, M, A or P; 

X2 is any amino acid residue; 
15 X3 is P, T or S; 

X4 is L, I, V, M, Aor P; 

X5 is any amino acid; 

Xfi is any amino acid; 

X^isL,!, V,M, A,F, YorW; 
20 XgisCTorS; 

X9 is R, K or H; 

X|o is any amino acid; 

Xn is any amino acid; 

X12 is L, I, V, M, A or P; 
25 Xi3 is any amino acid; 

Xi4 is any amino acid; 

Xi5 is any amino acid; 

X16 is L, I, V, M, A, P, G, C. T or S; 

[XJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
30 and wherein the sequence Xj may comprise the same or different amino 

. acids selected from any amino acid residue; 
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X,7isL,I, V,M, AorP; 
Xjg is any amino acid; 

is any amino acid; 
X20L. I, V,M,AorP; 
5 X2, is P; 

X22 is L, I, V, M, A, P or G; 
X23 is P or N; 

[Xjln is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
10 acids selected from any amino acid residue; 

X24 is I. V, M, A or P; 
X25 is any amino acid; 
X26 is any amino acid; 
X27 is Y or F; 

15 X28 is L, I, V, M, A or P; and 

(ii) comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
proteinrmolecule interacting domain in a region N-terminal of the SOCS box. 

20 Preferably, the proteins modulate signal transduction such as cytokine-mediated signal 
transduction. 

Preferred cytokines are EPO, TPO, G-CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, IL-6, LIF, 
IL-12, IFNy, TNFa, IL-1 and/or M-CSF. 

25 

A particularly preferred cytokine is IL-6. 

Even yet another aspect of the present invention provides a protein or derivative, homologue, 
analogue or mimetic thereof exhibiting the following characteristics: 
30 (i) is capable of modulating signal transduction such as cytokine-mediated signal 
transduction; 
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(ii) comprises a SOCS box in its C-terminal region having the amino acid sequence: 

X| X2 X3 X4 Xj Xj X7 Xg X9 X,o X,j X12 Xi3 X|4X,s X,5 [XJn Xi7 X,8 X|9 X20 
X21 X22 X23 [XjJa X24 X25 X26 X27X2g 



15 



wherein: X, is L, I, V, M, A or P; 

s any amino acid residue; 
s P, T or S; 
s L, I, V, M, AorP; 
10 X, is any amino acid; 

any amino acid; 
L. I, V. M. A.F.YorW; 
C, T or S; 
R, K or H; 
s any amino acid; 
s any amino acid; 
s L, I, V, M, A or P; 
s any amino acid; 
s any amino acid; 
s any amino acid; 
s L, I, V, M, A. P, G, C, T or S; 
[XJ„ is a sequence of n amino acids wherein n is &om 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
25 X,7isL,I,V,M,AorP; 

X,8 is any amino acid; 
X„ is any amino acid; 
X20L, I, V, M.AorP; 
X2, isP; 

30 X22 is L, I, V, M, A, P or G; 

XjsisPorN; 



20 



X2 
X3 
X4 
X5 
Xfi 
X7 
Xg 
X9 

Xu 

^13 



X 
X 



16 
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[XjJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V, M, A or P; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; 

X2g is L, I, V, M, A or P; and 



10 (iii) comprises at least one of a SH2 domain, WD-40 repeats and/or ankyrin repeats or other 
protein-molecule interacting domain in a region N-terminal of the SOCS box. 



Particularly preferred SOCS proteins comprise an amino acid sequence substantially as set forth 
in SEQ ID NO:4 (mSOCSl), SEQ ID N0:6 (mSOCS2), SEQ ID N0:8 (mSOCS3), SEQ ID 
15 NO:10 (hSOCSl), SEQ ID N0:12 (rSOCSl), SEQ ID NO:14 (mS0CS4), SEQ ID NO:18 
(mS0CS5), SEQ ID N0:21 (mSOCS6), SEQ ID NO:25 (mSOCS7), SEQ ID NO:29 
(mSOCSS), SEQ ID NO:36 (hSOCSll), SEQ ID NO:41 (mSOCS13), SEQ ID NO:44 
(mSOCS14), SEQ ID NO:46 (mSOCS15) and SEQ ID NO:48 (hS0CS15) or an amino acid 
. sequence having at least 15% similarity to all or a region of any one of the listed sequences. 

20 

Another aspect of the present invention contemplates a method of modulating levels of a SOCS 
protein in a cell said method comprising contacting a cell containing a SOCS gene with an 
effective amount of a modulator of SOCS gene expression or SOCS protein activity for a time 
and under conditions sufficient to modulate levels of said SOCS protein. 

25 

A related aspect of the present invention provides a method of modulating signal transduction 
in a cell containing a SOCS gene comprising contacting said cell with an effective amount of a 
modulator of SOCS gene expression or SOCS protein activity for a time sufficient to modulate 
signal transduction. 

30 

Yet a further related aspect of the present invention is directed to a method of influencing 
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interaction between cells wherein at least one cell carries a SOCS gene, said method comprising 
contacting the cell carrying the SOCS gene with an effective amount of a modulator of SOCS 
gene expression or SOCS protein activity for a time sufficient to modulate signal transduction. 

5 In accordance with the present invention, n in [XJ„ and [Xjl^ may, in addition from being 1-50, 
be from 1-30, 1-20, 1-10 and 1-5. 

A summary of the SEQ ID NOs referred to in the subject specification is given in Table 1. 

10 
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TABLE 1 

SUMMARY OF SEQUENCE IDENTITY NUMBERS 



SEQUENCE SEQ ID NO. 

PGR Primer 1 

PGR Primer 2 

Mouse sees 1 (nucleotide) 3 

Mouse SOCSl (amino acid) 4 

Mouse SOCS2 (nucleotide) 5 

Mouse S0CS2 (amino acid) 6 

Mouse S0GS3 (nucleotide) 7 

Mouse S0CS3 (amino acid) 8 

Human SOCSl (nucleotide) 9 

Human SOGS 1 (amino acid) 1 0 

Rat SOGS 1 (nucleotide) 1 1 

Rat SOGS 1 (amino acid) 12 

nucleotide sequence of murine S0GS4 13 

amino acid sequence of murine S0GS4 14 

nucleotide sequence of S0CS4 cDNA human contig 4. 1 15 

nucleotide sequence of SOCS4 cDNA human contig 4.2 16 

nucleotide sequence of murine S0CS5 17 

amino acid sequence of murine S0GS5 1 8 

nucleotide sequence of human S0CS5 19 

nucleotide sequence of murine S0GS6 20 

amino acid of murine S0GS6 2 1 

nucleotide sequence of human S0GS6 contig h6. 1 22 

nucleotide sequence of human SOGS6 contig h6.2 23 

nucleotide sequence of murine SOGS7 24 
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amino acid sequence of murine S0CS7 


25 


nucleotide sequence of human S0CS7 contig h7.1 


26 


nucleotide sequence of human S0CS7 contig 17.2 


27 


nucleotide sequence of murine S0CS8 


28 


amino acid sequence of murine SOCS 8 


29 


nucleotide sequence of murine S0CS9 


30 


nucleotide sequence of human S0CS9 


31 


nucleotide sequence of murine SOCS 10 


32 


nucleotide sequence of human SOCSIO contig hlO.l 


33 


nucleotide sequence of human SOCSIO contig hlO.2 


34 


nucleotide sequence of human SOCS 1 1 


35 


amino acid sequence of human SOCS 1 1 


36 


nucleotide sequence of mouse SOCS 12 


37 


nucleotide sequence of human S0CS12 contig hl2.1 


38 


nucleotide sequence of human SOCS 12 contig hl2.2 


39 


nucleotide sequence of murine SOCS 13 


40 


amino acid sequence of murine SOCS 13 


41 


nucleotide sequence of human SOCS 13 cDNA contig hi 3.1 


42 


nucleotide sequence of murine SOCS 14 cDNA 


43 


amino acid sequence of murine SOCS 14 


44 


nucleotide sequence of murine SOCS 15 cDNA 


45 


amino acid sequence of murine SOCS 15 


46 


nucleotide sequence of human SOCS 15 


47 


amino acid sequence of human SOCS 15 


48 



25 
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Single and three letter abbreviations are used to denote amino acid residues and these are 
summarized in Table 2. 



TABLE 2 



Amino Acid Three-letter One-letter 

Abbreviation Symbol 



Alanine 


Ala 


A 


10 Arginine 


Arg 


R 


Asparagine 


Asn 


N 


Aspartic acid 


Asp 


D 


Cysteine 


Cys 


C 


Glutamine 


Gb 


Q 


15 Glutamic acid 


Glu 


E 


Glycine 


Gly 


G 


Histidine 


His 


H 


Isoleucine 


He 


I 


Leucine 


Leu 


L 


20 Lysine 


Lys 


K 


Methionine 


Met 


M 


Phenylalanine 


Phe 


F 


Proline 


Pro 


P 


Serine 


Ser 


S 


25 Threonine 


Thr 


T 


Tryptophan 


Trp 


W 


Tyrosine 


Tyr 


Y 


Valine 


Val 


V 


Any residue 


Xaa 


X 



30 
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BRIEF DESCRIPTION OF THE DRAWINGS 

In some of the Figures, abbreviations are used to denote SOCS proteins with certain binding 
motifs. SOCS proteins which contain WD-40 repeats are referred to as WSB1-WSB4. SOCS 
5 proteins with ankyrin repeats are referred to as ASB1-ASB3. 

Figure 1 is a diagrammatic representation showing generation of an IL-6-unresponsive Ml clone 
by retroviral infection. The RUFneo retrovirus, showing the position of landmark restriction 
endonuclease cleavage sites, the 4A2 cDNA insert and the position of PCR primer sequences. 

10 

Figure 2 is a photographic representation of Southern and Northern analysis. (Left and Middle 
Panels) Southern blot analysis of genomic DNA from clone 4A2 and a control infected Ml clone. 
DNA was digested with BamH I, to reveal the number of retroviruses carried by each clone, and 
Sac I, to estimate the size of the retroviral cDNA insert. Left panel; probed with neo. Right 
15 panel; probed with the Xho I-digested 4A2 PCR product. (Right Panel) . Northern blot analysis 
of total RNA from clone 4A2 and a control infected Ml clone, probed with the Xho I-digested 
4A2 PCR product. The two bands represent unspliced and spliced retroviral transcripts, 
resulting from splice donor and acceptor sites in the retroviral genome. 

20 Figure 3 is a representation of the nucleotide sequence and structure of the SOCSl gene. A. 
The genomic context of SOCSl in relation to the protamine gene cluster on murine chromosome 
16. The accession number of this locus is MMPRMGNS (direct submission; G. Schlueter, 1995) 
for the mouse and BTPRMTNP2 for the rat (direct submission; G. Schlueter, 1996). B. The 
nucleotide sequence of the SOCSl cDNA and deduced amino acid sequence. Conventional one 

25 letter abbreviations are used for the amino acid sequence and the asterisk indicates the stop 
codon. The polyadenylation signal sequence is underlined. The coding region is shown in 
uppercase and the untranslated region is shown in lower case. 

Figure 4 is a graphical representation of cell differentiation in the presence of cytokines. Semi- 
30 solid agar cultures of parental Ml cells (Ml and MLmpl) and Ml cells expressing SOCSl (4A2 
and Ml.mpLSOCSl), were used and the percentage of colonies which differentiated in response 
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to a titration of 1 mg/ml IL-6 (•), 100 ng/ml LIF (0), 1 mg/ml OSM (□), 100 ng/ml IFN-y (^), 
500 ng/ml TPO (• or 3x10"^ M dexamethasone (*) determined. 

Figure 5 is a photographic representation of cytospins of liquid cultures of parental Ml cells 
5 (Ml and Ml .mpl) ) and M 1 cells expressing SOCS 1 (4A2 and Ml .mpl.SOCS 1 ) cultured for 4 
days in the presence of 10 ng/ml IL-6 or saline. Unlike parental Ml cells, morphological features 
consistent with macrophage differentiation are not observed in Ml cells eonstitutively expressing 
SOCSl (4A2 and Ml.mpl.SOCSl) when cultured in IL-6. 

10 Figure 6 is a photographic representation showing inhibition of phosphorylation of signalling 
molecules by SOCSl. Parental Ml cells (Ml and MLmpl) and Ml cells expressing SOCSl 
(4A2 and Ml.mpLSOCSl) were incubated in the absence (-) or presence (+) of 10 ng/ml of IL-6 
for 4 minutes at 37°C . Cells were then lysed and extracts were either inununopreciptated using 
anti-mouse gplSO antibody prior to SDS-PAGE (two upper panels) or were electrophoresed 

15 directly (two lower panels). Gels were blotted and the filters were then probed with anti- 
phosphotyrosine (upper panel), anti-gpl30 antibody (second top panel), anti-phospho-STAT3 
(second bottom panel) or anti-STAT3 (lower panel). Blots were visualised using peroxidase- 
conjugated secondary antibodies and Enhanced Chemiiuminescence (ECL) reagents. 

20 Figure 7 is a representation of protein extracts prepared from (A) Ml cells or Ml cells 
expressing SOCSl (4A2) and (B) MLmpl cells or Ml.mpl.SOCSl cells incubated for 10 min 
at 37°C in 10 ml serum-free DME containing either saline, 100 ng/ml IL-6 or 100 ng/ml IFN-y. 
The binding reactions contained 4-6 |ig protein (constant within a given experiment), 5 ng ^^P- 
labelled m67 oligonucleotide encoding the high affinity SIF (c-5w- inducible factor) binding site, 

25 and 800 ng sonicated salmon sperm DNA. For certain experiments, protein samples were 
preincubated with an excess of unlabelled m67 oligonucleotide, or antibodies specific for either 
STATl orSTAT3. 

Figure 8 is a photographic representation of Northern hybridisation. Mice were injected 
30 intravenously with 2 //g and after various periods of time, the livers were removed and polyA+ 
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mRNA was purified. Ml cells were stimulated for various lengths of time with 500 ng/ml of EL- 
6, after which polyA+ mRNA was isolated. mRNA was fractionated by electrophoresis and 
immobilized on nylon filters. Northern blots were prehybridized, hybridized with random-primed 
^^P-labelled SOCSl or GAPDH DNA fragments, washed and exposed to film overnight 

5 

Figure 9 is a representation of a comparison of the amino acid sequences of SOCSl, S0CS2, 
SOCS3 and CIS. Alignment of the predicted amino acid sequence of mouse (mm), human (hs) 
and rat (rr) SOCSl, S0CS2, S0CS3 and CIS. Those residues shaded are conserved in three or 
four mouse SOCS family members. The SH2 domain is boxed in soUd lines, while the SOCS box 
10 is bounded by double lines. 

Figure 10 is a photographic representation showing the phenotype of IL-6 unresponsive Ml cell 
clone, 4A2. Colonies of parental Ml cells (left panel) and clone 4A2 (right panel) cultured in 
semi-solid agar for 7 days in saline or 100 ng/ml IL-6. 

15 

Figure 11 is a photographic representation showing expression of mRNA for SOCS family 
members in vitro and in vivo, 

(A) Northern analysis of mRNA from a range of mouse organs showing constitutive 
. expression of SOCS family members in a limited number of tissues. 
20 (B) Norther analysis of mRNA from liver and Ml cells showing induction of expression of 
SOCS family members following exposure to IL-6. 

(C) Reverse transcriptase PCR analysis of mRNA from bone marrow showing induction of 
expression of SOCS family members by a range of cytokines. 

25 Figure 12 is a photographic representation showing SOCSl suppresses the phosphorylation and 
activation of gpl30 and STAT-3. 

(A) Western blots of extracts from parental Ml cells (Ml and Ml.mpl) and Ml cells 
expressing SOCSl (4A2 and Ml.mpLSOCSl) stimulated with (+) or without (-) 100 ng/ml IL-6. 
Top: Extracts immunoprecipitated with antu-gpl30 (agplSO) and immunoblotted with anti- 
30 phosphotyrosine (aPY-STAT3), or for STAT3 (aSTAT3) to demonstrate equal loading of 
protein. The molecular weights of the bands are shown on the right. 
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(B) EMSA of Ml.mpl and Ml.mpl.SOCSl cells stimulated with (+) and without (-) 100 
ng/ml IL-6 or 100 ng/ml IFNy. The DNA-binding complexes SJF A, B, and C are indicated at 
the left. 

5 Figure 13 is a representation of a comparison of the amino acid sequence of the SOCS proteins 
(A) Schematic representation of structures of SOCS proteins including proteins which contain 
WD-40 repeats (WSB) and ankyrin repeats (ASB). (B) Alignment of N-terminal regions of 
SOCS proteins. (C) Alignment of the SH2 domains of CIS, SOCSl, 2, 3, 5, 9, 1 1 and 14. (D) 
Alignment of the WD-40 repeats of SOCS4, S0CS6, S0CS13 and S0CS15. (E) Alignment of 

10 the ankyrin repeats ofSOCS? and SOCS 10. (F) Alignment of the regions between SH2, WD-40 
and ankyrin repeats and the SOCS box. (G) Alignment of the SOCS box. In each case the 
conventional one letter abbreviations for amino acids are used, with X denoting residues of 
uncertain identity and OOO denoting the beginning and the end of contigs. Amino acid 
sequence obtained from conceptual translation of nucleic acid sequence derived from isolated 

15 cDNAs is shown in upper case while amino acid sequence obtained by conceptual translation of 
ESTs is shown in lower case and is approximate only. Conserved residues, defined as (LIVMA), 
(FYW), (DE), (QN), (C, S, T), (KRH), (PG) are shaded in the SH2 domain, WD-40 repeats, 
ankyrin repeats and the SOCS box. For the alignment of SH2 domains, WD-40 repeats and 
ankyrin repeats a consensus sequence is shown above. In each case this has been derived from 

20 examination of a large and diverse set of domains (Neer et al, 1994; Bork, 1993), 

Figures 14(A) and (B) are photographic representations showing analysis of mRNA expression 
of mouse SOCSl and S0CS5 and SOCS containing a WD-40 repeat (WSB2) and ankyrin 
repeats (ASBl). 

25 

Figure 15 is a representation showing the nucleotide sequence of the mouse SOCS4 cDNA. The 
nucleotides encoding the mature coding region from the predicted ATG "start" codon to the stop 
codon is shown in upper case, while the predicted 5' and 3' untranslated regions are shown in 
lower case. The relationship of mouse cDNA sequence to mouse and human EST contigs is 
30 illustrated in Figure 17. 
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Figure 16 is a representation showing the predicted amino acid sequence of the mouse S0CS4 
protein, derived j&rom the nucleotide sequence in Figure 15. The SOCS box, which also shown 
in Figure 13, is underlined. 

5 Figure 18 is a representation showing the nucleotide sequence of human S0CS4 cDNA contigs 
h4.1 and h4.2, derived from analysis of ESTs listed in Table 4.1. The relationship of these 
contigs to the mouse cDNA sequence is illustrated in Figure 17. 

Figure 19 is a diagrammatic representation showing the relationship of mouse SOCS5 genomic 
10 (57-2) and cDNA (5-3-2) clones to contigs derived from analysis of mouse ESTs (Table 5.1) and 
human cDNA clone (5-94-2) and ESTs (Table 5.2). The nucleotide sequence of the mouse 
S0CS5 contig is shown in Figure 20, with the sequence of human S0CS5 contig (hS.l) being 
shown in Figure 21. The deduced amino acid sequence of mouse SOCS5 is shown in Figure 
20B. The structure of the protein is shown schematically, with the SH2 domain indicated by 
15 ( ) and the SOCS box by ( ). The putative 5' and 3' translated regions are shown by the thin 
solid hne. 

Figure 20A is a representation showing the nucleotide sequence of the mouse S0CS5 derived 
. from analysis of genomic and cDNA clones. The nucleotides encoding the mature coding region 
20 from the predicted ATG "start" codon to the stop codon is shown in upper case, while the 
predicted 5' and 3' untranslated regions are shown in lower case. The relationship of mouse 
cDNA sequence to mouse and human EST contigs is illustrated in Figure 19. 

Figure 20B is a representation of the predicted amino acid sequence of mouse S0CS5 protein, 
25 derived from the nucleotide sequence in Figure 20A. The SOCS box, which also shown in 
Figure 13 is underlined. 

Figure 21 is a representation showing the nucleotide sequence of human SOCS5 cDNA contig 
h5.1, derived from analysis of cDNA clone 5-94-2 and the ESTs listed in Table 5.2. The 
30 relationship of these contigs to the mouse cDNA sequence is illustrated in Figure 19. 
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Figure 22 is a diagrammatic representation showing the relationship of mouse SOCS6 cDNA 
clones (6-1 A, 6-2A, 6-5B, 6-4N, 6-18, 6-29, 6-3N and 6-5N) to contigs derived from analysis 
of mouse ESTs (Table 6. 1) and human ESTs (Table 6.2). The nucleotide sequence of the mouse 
SOCS-6 contig is shown in Figure 23, with the sequence of human SOCS6 contigs (h6.1 and 
5 h6.2) being shown in Figure 24. The deduced amino acid sequence of mouse S0CS6 is shown 
in Figure 23B. The structure of the protein is shown schematically, while the WD-40 repeats 
indicated by ( ) and the SOCS box by ( ). The putative 5' and 3' untranslated regions are 
shown by the thin solid line. 

10 Figure 23A is a representation showing the nucleotide sequence of the mouse S0CS6 derived 
from analysis of cDNA clone 64-lOA-l 1. The nucleotides encoding the part of the predicted 
coding region, ending in the stop codon are shown in upper case, while the predicted 3' 
untranslated regions are shown in lower case. The relationship of mouse cDNA sequence to 
mouse and human EST contigs is illustrated in Figure 22. 

15 

Figure 23B is a representation showing the predicted amino acid sequence of mouse S0CS6 
protein, derived from the nucleotide sequence in Figure 23A. The SOCS box, which also shown 
in Figure 13 is underlined. 

20 Figure 24 is a representation showing the nucleotide sequence of human S0CS6 cDNA contig 
h6.1, derived from analysis of cDNA clone 5-94-2 and the ESTs listed in Table 6.2. The 
relationship of these contigs to the mouse cDNA sequence is illustrated in Figure 22 

Figure 25.is a diagranmiatic representation showing the relationship of mouse S0CS7 cDNA 
25 clone (74-lOA-l 1) to contigs derived from analysis of mouse ESTs (Table 7.1) and human ESTs 
(Table 7.2). The nucleotide sequence of the mouse SOCS7 contig is shown in Figure 26 with 
the sequence of human S0CS7 contigs (h7.1 and h7.2) being shown in Figure 27. The deduced 
amino acid sequence of mouse SOCS7 is shown in Figure 26B. The structure of the protein is 
shown schematically, with the ankyrin repeats indicated by ( ) and the SOCS box by ( ). The 
30 putative 5' and 3' untranslated regions are shown by the thin solid line in the mouse and by the 
wavy line in h7.2. Based on analysis of clones isolated to date and ESTs the 3' untranslated 
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regions of mS0CS7 and hS0CS7 share little similarity. 

Figure 26A is a representation showing the nucleotide sequence of the mouse S0CS7 derived 
from analysis of cDNA clone 74-lOA-l 1. The nucleotides encoding the part of the predicted 
5 coding region, ending in the stop codon are shown in upper case, while the predicted 3' 
untranslated regions are shown in lower case. The relationship of mouse cDNA sequence to 
mouse and human EST contigs is illustrated in Figure 25. 

Figure 26B is a representation showing the predicted amino acid sequence of mouse S0CS7 
10 protein, derived from the nucleotide sequence in Figure 26A. The SOCS box, which also shown 
in Figure 13 is underlined. 

Figure 27 is a representation showing the nucleotide sequence of human SOCS7 cDNA contig 
h7.1 and h7.2 derived from analysis of the ESTs hsted in Table 7.2. The relationship of these 
15 contigs to the mouse cDNA sequence is illustrated in Figure 25. 

Figure 28 is a diagrammatic representation of the relationship of sequence derived from analysis 
of mouse SOCS 8 ESTs (Table 8.1 and Figure 29 A) to the predicted protein structure of mouse 
. S0CS8. The deduced partial amino acid sequence of mouse S0CS8 is shown in Figure 29B. 
20 The structure of the protein is shown schematically with the SOCS box highlighted ( ). The 
predicted 3' untranslated region is shown by the thin line. 

Figure 29A is a representation showing the partial nucleotide sequence of mouse SOCS8 cDNA 
(contig 8.1) derived from analysis of ESTs. The nucleotides encoding the part of the predicted 
25 coding region, ending in the STOP codon are shown in upper case, while the predicted 3' 
untranslated regions are shown in lower case. 

Figure 29B is a representation showing the partial predicted amino acid sequence of the mouse 
S0CS8 protein, derived from the nucleotide sequence in Figure 29A. The SOCS box, which 
30 also shown in Figure 13 is underlined. 
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Figure 30 is a diagrammatic representation showing the relationship of mouse SOCS9 ESTs 
(Table 9.1) and human S0CS9 ESTs (Table 9.2). The nucleotide sequence of the mouse S0CS9 
contig (m9.1) is shown in Figure 31, with the sequence of human S0CS9 contig (h9.1) being 
shown in Figure 32. The deduced amino acid sequence of human SOCS9 is shown 
5 schematically, with the SH2 domain indicated by ( ) and the SOCS box by ( ). The putative 3' 
untranslated region is shown by the thin solid line. 

Figure 31 is a representation showing the partial nucleotide sequence of mouse S0CS9 cDNA 
(contig m9.1), derived from analysis of the ESTs listed in Table 9.1. The relationship of these 
10 contigs to the mouse cDNA sequence is illustrated in Figure 30. 

Figure 32 is a representation showing the partial nucleotide sequence of human SOCS9 cDNA 
(contig h9.1), derived from analysis of the ESTs listed in Table 9.2. Although it is clear that 
contig h9. 1 encodes a protein with an SE2 domain and a SOCS box, the quality of the sequence 
15 is not high enough to derive a single unambiguous open reading frame. The relationship of these 
contigs to the mouse cDNA sequence is illustrated in Figure 30. 

Figure 33 is a representation showing the relationship of mouse SOCS 10 cDNA clones (10-9, 
. 10-12. 10-23 and 10-24) to contigs derived from analysis of mouse ESTs (Table 10.1) and 

20 human ESTs (Table 10.2). The nucleotide sequence of the mouse SOCS 10 contig is shown in 
Figure 10.2, with the sequence of human SOCSIO contigs (hlO.l and hlO.2) being shown in 
Figure 35. The predicted structure of the protein is shown schematically, with the ankyrin 
repeats indicated by ( ) and the SOCS box by ( ). The putative 3 ' untranslated regions is shown 
by the thin line solid line in the mouse and by the wavy line in hi 0.2. Based on analysis of clones 

25 isolated to date and ESTs the 3' untranslated regions of mSOCS-10 and hSOCS-10 share little 
similarity. 

Figure 34 is a representation showing the nucleotide sequence of the mouse SOCSIO derived 
from analysis of cDNA clone 10-9, 10-12, 10-23 and 10-24. The nucleotides encoding the part 
30 of the predicted coding region, ending in the stop codon are shown in upper case, while the 
predicted 3' untranslated regions are shown in lower case. Although it is clear that contig mlO. 1 
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encodes a protein with a series of ankyrin repeats and a SOCS box, the quality of the sequence 
is not high enough to derive a single unambiguous open reading frame. The relationship of 
mouse cDNA sequence to mouse and human EST contigs is illustrated in Figure 33. 

5 Figure 35 is a representation showing the nucleotide sequence of human SOCS 10 cDNA contig 
hlO.2 and hlO.2 derived from analysis of the ESTs listed in Table 10.2. The relationship of these 
contigs to the mouse cDNA sequence is illustrated in Figure 33. 

Figure 36 A is a representation showing the partial nucleotide sequence of the human SOCS 1 1 
cDNA derived from analysis of ESTs listed in Table 11.1 The nucleotides encoding the mature 
10 coding region from the predicted ATG "start" codon to the stop codon is shown in upper case, 
while the predicted 5' and 3' untranslated regions are shown in lower case. The relationship of 
the partial cDNA sequence, derived from ESTs. to the predicted protein is shown in Figure 37. 

Figure 36B is a representation showing the partial predicted amino acid sequence of human 
15 SOCS 1 1 protein, derived from the nucleotide sequence in Figure 36A. The SOCS box, which 
also shown in Figure 13, is underlined. 

Figure 37 is a diagrammatic representation showing the relationship of sequence derived from 
. analysis of human SOCS- 1 1 ESTs (Table 1 1 . 1 and Figure 36A) to the predicted protein structure 
20 of human SOCS 1 1 . The deduced partial amino acid sequence of human SOCS 1 1 is shown in 
Figure 36B. The structure of the protein is shown schematically with the SH2 domain shown 
by 0 and the SOCS box highlighted by ( ). The predicted 3 ' untranslated region is shown by 
the thin line. 

25 Figure 38 is a diagranmiatic representation showing the relationship of mouse SOCS 12 cDNA 
clones (12-1) to contigs derived from analysis of mouse ESTs (Table 12.1) and human ESTs 
(Table 12.2). The nucleotide sequence of the mouse S0CS12 contig is shown in Figure 12.2, 
with the sequence of human S0CS12 contigs (hl2.1 and hl2.2) being shown in Figure 40. The 
deduced partial amino acid sequence of mouse SOCS 12 is shown in Figure 39. The structure 

30 of the protein is sown schematically, with the ankyrin repeats indicated by ( ) and the SOCS box 
by ( ). The putative .3' untranslated region is shown by the thin line solid line in the mouse and 
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by the wavy line in hi 2.2. Based on analysis of clones isolated to date and ESTs the 3' 
untranslated regions of mS0CS12 and hSOCS12 share little similarity. 

Figure 39 is a representation showing the nucleotide sequence of the mouse SOCS12 derived 
5 from analysis of cDNA clone 12-1 and the ESTs listed in Table 12.1. The nucleotides encoding 
the part of the predicted coding region, including the stop codon are shown in upper case, while 
the predicted 3' untranslated region is shown in lower case. By homology with human S0CS12 
it is clear that contig ml 2.1 encodes a protein with a series of ankyrin repeats and a SOCS box, 
the quality of the sequence is not high enough to derive a single unambiguous open reading 
10 frame. The relationship of mouse cDNA sequence to mouse and human EST contigs is 
illustrated in Figure 38. 

Figure 40 is a representation showing the nucleotide sequence of human SOCS 12 cDNA contig 
hl2.1 and hl2.2 derived from analysis of the ESTs listed in Table 12,2. The relationship of these 
15 contigs to the mouse cDNA sequence is illustrated in Figure 38. 

Figure 41 is a diagrammatic representation showing the relationship of contig ml3.1 derived 
from analysis of mouse SOCS 13 cDNA clones (62-1, 62-6-7, 62-14) and mouse ESTs (Table 
. 13.1) to contig hi 3.1 derived from analysis of human ESTs (Table 13.2). The nucleotide 
20 sequence of the mouse SOCS 13 contig is shown in Figure 42, with the sequence of human 
S0CS13 contig (hl3.1) being shown in Figure 43. The deduced amino acid sequence of mouse 
SOCS 13 is shown in Figure 42B. The structure of the protein is shown schematically, with the 
WD-40 repeats highlighted by ( ) and the SOCS box highlighted by ( ). The 3' untranslated 
region is shown by the thin line solid line. 

25 

Figure 42A is a representation showing the nucleotide sequence of the mouse SOCS 13 derived 
from analysis of cDNA clones 62-1 , 62-6-7 and 62-14. The nucleotides encoding part of the 
predicted coding region, ending in the stop codon are shown in upper case, while those encoding 
the predicted 3' untranslated regions are shown in lower case. The relationship of mouse cDNA 
30 sequence to mouse and human EST contigs is illustrated in Figure 41. 
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Figure 42B is a representation showing the predicted amino acid sequence of mouse S0CS13 
protein, derived from the nucleotide sequence in Figure 42A. The SOCS box, which also shown 
in Figure 13 is underlined. 

5 Figure 43 is a representation showing the nucleotide sequence of human SOCS 1 3 cDN A contig 
hl3.1 derived from analysis of the ESTs listed in Table 13.2. The relationship of these contigs 
to the mouse cDNA sequence is illustrated in Figure 41. 

Figure 44 is a diagrammatic representation showing the relationship of a partial mouse SOCS 14 
10 cDNA clone (14-1) to contigs derived from analysis of mouse ESTs (Table 14.1). The 
nucleotide sequence of the mouse SOCS 14 contig is shown in Figure 45. The deduced partial 
amino acid sequence of mouse SOCS 14 is shown in Figure 45B. The structure of the protein 
is shown schematically, with the SH3 domain indicated by ( ) and the SOCS box by ( ). The 
putative 3' untranslated region is shown by the thin line. 

15 

Figure 45A is a representation showing the nucleotide sequence of the mouse S0CS14 derived 
from analysis of genomic and cDNA clones. The nucleotides encoding the mature coding region 
from the predicted ATG "start" codon to the stop codon is shown in upper case, while the 
. predicted 5' and 3' untranslated regions are shown in lower case. The relationship of mouse 
20 cDNA sequence to mouse and human EST contigs is illustrated in Figure 44. 

Figure 45B is a representation showing the predicted amino acid sequence of mouse SOCS 14 
protein, derived from the nucleotide sequence in Figure 45B. The SOCS box, which also shown 
in Figure 13 is underlined. 

25 

Figure 46 is a diagrammatic representation showing the relationship of contig ml5.1 derived 
from analysis of mouse BAC and mouse ESTs (Table 15. 1) to contig hl5. 1 derived from analysis 
of the human BAC and human ESTs (Table 15.2). The nucleotide sequence of the mouse 
SOCS 15 contig is shown in Figure 47, with the sequence of human SOCS15 contig (hl5.1) 
30 being shown in Figure 47. The deduced amino acid sequence of mouse SOCS 15 is shown in 
Figure 47B. The structure of the protein is shown schematically, with the WD-40 repeats 
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highlighted by ( ) and the SOCS box highlighted by ( ). The 5' and 3' untranslated region are 
shown by the thin line solid line. The introns which interrupt the coding region are shown by 

Figure 47A is a representation showing the nucleotide sequence covering the mouse SOCS 15 
5 gene derived from analysis the mouse BAC listed in Table 15.1. The nucleotides encoding the 
predicted coding region, beginning with the ATG and ending in the stop codon are shown in 
upper case, while those encoding the predicted 5' untranslated region, the introns and the 3' 
untranslated region are shown in lower case. The relationship of mouse BAC to mouse and 
human ESTs contigs is illustrated in Figure 46. 

10 

Figure 47B is a representation showing the predicted amino acid sequence of mouse SOCS 15 
protein, derived from the nucleotide sequence in Figure 47A. The SOCS box, which also shown 
in Figure 13 is underlined. 

15 Figure 48A is a representation showing the nucleotide sequence covering the human SOCS 15 
gene derived from analysis the human BAC listed in Table 15.2. The nucleotides encoding the 
predicted coding region, beginning with the ATG and ending in the stop codon are shown in 
upper case, while those encoding the predicted 5' untranslated region, the introns and the 3' 
. untranslated region are shown in lower case. The relationship of the human BAC to mouse and 

20 human ESTs contigs is illustrated in Figure 46. 

Figure 48B is a representation showing the predicted amino acid sequence of human SOCS 15 
protein, derived from the nucleotide sequence in Figure 48A. The SOCS box, which also shown 
in Figure 13 is underlined. 

25 

Figure 49 is a photographic representation showing SOCSl inhibition of JAK2 kinase activity. 
(A) Upper panel. Cos M6 cells were transiently transfected with either Flag-tagged mJAK2 and 
mSOCS-1 DNA (SOCSl) or FIag-mJAK2 DNA alone (-), lysed, JAK2 proteins 
immunoprecipitated using anti-JAK2 antibody and subjected to an in vitro kinase assay. Lower 
30 panel. A portion of the JAK2 immunoprecipitates were Western blotted with anti-JAK2 
antibody. (B) Upper panel. Cos M6 cells were transiently transfected with Flag- mJAK2 and 
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Flag- mSOCS-1 DNA or Flag-mJAK2 DNA alone, lysed, JAK2 proteins immunoprecipitated 
using anti-JAK2 (UBI) and separated by SDS/PAGE gel. Immunoprecipitates were then 
analysed by Western blot with anti-phosphotyrosine antibody. Lower panel; JAK2 expression, 
Cos cell lysates were separated by SDS/PAGE gel and analysed by Western blot with anti-FLAG 
5 antibody (M2). 

Figure 50 is a photographic representation showing interaction between JAK2 and SOCS 
protein. (A) Cos M6 cells were transiently transfected with Flag-tagged mJAK2 and various 
Flag-tagged SOCS DNAs (SOCS-l;Sl, SOCS-2;S2, SOCS-3;S3, CIS) or Flag-mJAK2 alone, 

10 lysed, JAK2 proteins immunoprecipitated using anti-JAK2 (UBI) and separated by SDS/PAGE. 
Immunoprecipitates were then analysed by Western blot with anti-FLAG antibody (M2). (B) 
Cos cell lysates described in (A) were separated by SDS/PAGE and expression levels of the 
various proteins were determined by Western blot with anti-FLAG antibody (M2). (C) JAK2 
tyrosine phosphorylation. Cos cell lysates described in (A) were separated by SDS/PAGE and 

15 proteins analysed by Western blot witii anti-phosphotyrosine antibody. 

Figure 51 is a diagrammatic representation of pPgalpAloxneo. 
Figure 52 is a diagranmiatic representation of ppgalpAIoxneoTK. 

20 

Figure 53 is a diagrammatic representation of SOCSl knockout construct. 



25 



30 
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DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

The present invention provides a new family of modulators of signal transduction. As the initial 
members of this family suppressed cytokine signalling, the family is referred to as the 
5 "suppressors of cytokine signalling" family of "SOCS". The SOCS family is defined by the 
presence of a C-terminal domain referred to as a "SOCS box". Different classes of SOCS 
molecules are defined by a motif generally but not exclusively located N-terminal to the SOCS 
box and which is involved by proteinrmolecule interaction such as proteinrDNA or 
proteinrprotein interaction. Particularly preferred motifs are selected from an SH2 domain, WD- 
10 40 repeats and ankyrin repeats. 

WD-40 repeats were originally recognised in the P-subunit of G-proteins. WD-40 repeats appear 
to form a P-propeller-like structure and may be involved in protein-protein interactions. Ankyrin 
repeats were originally recognised in the cytoskeletal protein ankryin. 

15 

Members of the SOCS family may be identified by any number of means. For example, SOCS 1 
to S0CS3 were identified by their ability to suppress cytokine-mediated signal transduction and, 
hence, were identified based on activity. SOCS4 to SOCS 15 were identified as nucleotide 
sequences exhibiting similarity at the level of the SOCS box, 

20 

The SOCS box is a conserved motif located in the C-terminal region of the SOCS molecule. In 
accordance with the present invention, the amino acid sequence of the SOCS box is: 

Xi X2 X3 X4 X5 X5 X7 Xg X9 Xjo X,i X,2 Xi3 X,4Xi5 Xi5 [XJn Xj7 X,8 Xi9 X20 

25 X21 X22 X23 [XjJn X24 X25 X26 X27X28 

wherein: X, is L, I, V, M, A or P; 

X2 is any amino acid residue; 
X3 is P, T or S; 
30 X4isL,I, V,M, AorP; 

X5 is any amino acid; 
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is any amino acid; 
X7 is L, I, V, M, A, F, YorW; 
Xg is CTorS; 
Xgis R,Kor H; 
5 Xio is any amino acid; 

X, , is any amino acid; 
Xj2 is L, I, V, M, A or P; 
Xi3 is any amino acid; 
Xi4 is any amino acid; 
10 Xi5 is any amino acid; 

X16 is L. I, V, M, A, P, G, C T or S; 

[XJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
15 XnisUI, V,M, AorP; 

X18 is any amino acid; 
Xj9 is any amino acid; 
X2oL,I, V.M, AorP; 
X2, is P; 

20 XiiisLJ, V,M, A,PorG; 

X23 is P or N; 

[Xj]n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
25 X24isL,I, V,M, AorP; 

X25 is any amino acid; 
X26 is any amino acid; 
X27 is Y or F; and 
X28 is L, I, V, M, Aor P. 



30 



As stated above and in accordance with the present invention, SOCS proteins are divided into 
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separate classes based on the presence of a proteinrmolecule interacting region such as but not 
limited to an SH2 domain, WD-40 repeats and ankyrin repeats located N-terminal of the SOCS 
box. The latter three domains are proteinrprotein interacting domains. 

5 Examples of SH2 containing SOCS proteins include SOCSl. S0CS2, S0CS3, SOCS5, S0CS9, 
SOCS 1 1 and SOCS 14. Exan5)les of SOCS containing WD-40 repeats include SOCS4, S0CS6 
and SOCS15. Examples of SOCS containing ankyrin repeats include SOCS7, SOCSIO and 
SOCS 12. 

10 The present invention provides inter alia nucleic acid molecules encoding SOCS proteins, 
purified naturally occurring SOCS proteins as well as recombinant forms of SOCS proteins and 
methods of modulating signal transduction by modulating activity of SOCS proteins or 
expression of SOCS genes. Preferably, signal transduction is mediated by a cytokine, examples 
of which include EPO, TPO, G-CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, IL-6, LIF, IL-12, 

15 IFNy, TNFa, IL-1 and/or M-CSF. Particularly preferred cytokines include IL-6, LIF, OSM. 
IFN-y and/or thrombopoietin. 

Accordingly, one aspect of the present invention provides an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
20 protein or a derivative, homologue, analogue or mimetic thereof or comprises a nucleotide 
sequence capable of hybridizing thereto under low stringency conditions at 42**C wherein said 
protein comprises a SOCS box in its C-terminal region and optionally a protein:molecule 
interacting domain N-terminal of the SOCS box. 

25 Preferably, the protein:molecule interacting domain is a protein:DNA or proteinrprotein 
interacting domain. Most preferably, the proteinrmolecule interacting domain is one of an SH2 
domain, WD-40 repeats and/or ankyrin repeats. 

As stated above, preferably the subject SOCS modulate cytokine-mediated signal transduction. 
30 The present invention extends, however, to SOCS molecules modulating other effector-mediated 
signal transduction such as mediated by other endogenous or exogenous molecules, antigens, 
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microbes and microbial products, viruses or components thereof, ions, hormones and parasites. 
Endogenous molecules in this context are molecules produced within the cell carrying the SOCS 
molecule. Exogenous molecules are produced by other cells or are introduced to the body. 

5 Preferably, the nucleic acid molecule or SOCS protein is in isolated or purified form. The terms 
"isolated" and "purified" mean that a molecule has undergone at least one purification step away 
from other material. 

Preferably, the nucleic acid molecule is in isolated form and is DNA such as cDNA or genomic 
10 DNA. The DNA may encode the same amino acid sequence as the naturally occurring SOCS 
or the SOCS may contain one or more amino acid substitutions, deletions and/or additions. The 
nucleotide sequence may correspond to the genomic coding sequence (including exons and 
introns) or to the nucleotide sequence in cDNA from mRNA transcribed from the genomic gene 
or it may carry one or more nucleotide substitutions, deletions and/or additions thereto. 

15 

In a preferred embodiment, the nucleic acid molecule comprises a sequence of nucleotide 
encoding or complementary to a sequence encoding a SOCS protein or a derivative, homologue, 
analogue or mimetic thereof wherein the amino acid sequence of said SOCS protein is selected 
. from SEQ ID NO:4 (mSOCSl), SEQ ID NO:6 (mS0CS2), SEQ ID NO:8 (mSOCS3), SEQ ID 

20 NO:10 (hSOCSl), SEQ ID N0:12 (rSOCSl), SEQ ID N0:14 (mS0CS4), SEQ ID N0:18 
(mSOCSS), SEQ ID N0:21 (mSOCS6), SEQ ID NO:25 (mSOCS27), SEQ ID NO:29 
(mS0CS8), SEQ ID NO:36 (hSOCSll), SEQ ID NO:41 (mSOCS13), SEQ ID NO:44 
(mSOCSM), SEQ ID NO:46 (mSOCSlS) and SEQ ID NO:48 (mSOCSlS) or encodes an amino 
acid sequence with a single or multiple amino acid substitution, deletion and/or addition to the 

25 listed sequences or is a nucleotide sequence capable of hybridizing to the nucleic acid molecule 
under low stringency conditions at 42°C. 

In an even more preferred embodiment, the present invention provides a nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
30 SOCS protein or a derivative, homologue, analogue or mimetic thereof wherein the nucleotide 
sequence is selectjcd from a nucleotide sequence substantially set forth in SEQ ID N0:3 
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(mSOCSl), SEQ ID NO:5 (mS0CS2), SEQ ID N0:7 (mS0CS3), SEQ ID NO:9 (hSOCSll), 
SEQ ID NO: 11 (rSOCSl), SEQ ID NO: 13 (mS0CS4), SEQ ID NO: 15 and SEQ ID NO: 16 
(hS0CS4), SEQ ID NO: 17 (mS0CS5), SEQ ID NO: 19 (hS0CS5), SEQ ID NO:20 (mS0CS6), 
SEQ ID NO:22 and SEQ ID NO:23 (hS0CS6), SEQ ID NO:24 (niS0CS7), SEQ ID NO:26 and 
5 SEQ ID NO:27 (hS0CS7), SEQ ID NO:28 (mS0CS8), SEQ ID NO:30 (mS0CS9), SEQ ID 
N0:31 (hS0CS9), SEQ ID NO:32 (mSOCSlO), SEQ ID N0:33 and SEQ ID NO:34 
(hSOCSlO), SEQ ID NO:35 (hSOCSl 1), SEQ ID NO:37 (mSOCS12), SEQ ID NO:38 and 
SEQ ID NO:39 (hS0CS12), SEQ ID NO:40 (mS0CS13), SEQ ID NO:42 (hS0CS13), SEQ 
ID NO:43 (mSOCSU), SEQ ID NO:45 (mSOCSlS) and SEQ ID NO:47 (hS0CS15) or a 
10 nucleotide sequence having at least about 15% similarity to aU or a region of any of the listed 
sequences or a nucleic acid molecule capable of hybridizing to any of the listed sequences under 
low stringency conditions at 42*'C. 

Reference herein to a low stringency at 42°C includes and encompasses from at least about 1% 
15 v/v to at least about 15% v/v formamide and from at least about IM to at least about 2M salt for 
hybridisation, and at least about IM to at least about 2M salt for washing conditions. Alternative 
stringency conditions may be applied where necessary, such as medium stringency, which 
includes and encompasses from at least about 16% v/v to at least about 30% v/v formamide and 
from at least about 0.5M to at least about 0.9M salt for hybridisation, and at least about 0.5M 
20 to at least about 0.9M salt for washing conditions, or high stringency, which includes and 
encompasses from at least about 31% v/v to at least about 50% v/v formamide and from at least 
about O.OIM to at least about 0.15M salt for hybridisation, and at least about O.OIM to at least 
about 0. 15M salt for washing conditions. 

25 In another embodiment, the present invention is directed to a SOCS protein or a derivative, 
homologue, analogue or mimetic thereof wherein said SOCS protein is identified as follows: 

human S0CS4 characterised by EST81149, EST180909, EST182619, ya99H09, 
ye70co4, yh53c09, yh77gll, yh87h05, yi45h07, yj04e06, yql2h06, yq56a06, yq60e02, 
30 yq92g03, yq97h06, yr90f01, yt69c03, yv30a08, yv55f07, yv57h09, yv87h02, yv98el 1, 

yw68dl0, yw82a03, yx08a07, yx72h06, yx76b09, yy37h08, yy66b02, za81f08, zbl8f07. 
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zc06e08, zdl4g06, zd51hl2, zd52b09, ze25gl 1, ze69f02, zf54f03, zh96e07, zv66hl2. 
zs83a08 and zs83g08; 

mouse SOCS-4 characterised by nic65fD4, inf42e06, raplOclO, mr81g09, and mtl9hl2; 

human SOCS-5 characterised by EST15B103, EST15B105, EST27530 and zf50f01; 

mouse SOCS-5 characterised by mc55a01, mh98f09, my26hl2 and ve24e06; 

human SOCS-6 characterised by yf61e08, yf93a09, yg05fl2, yg41f04, yg45c02, 
yhl IflO, yhl3b05, zc35al2, ze02h08, zl09a03, zl69elO, zn39d08 and zo39e06; 

mouse SOCS-6 characterised by mc04c05, md48ap3, mf31d03, mh26b07, mh78ell, 
mh88h09, mh94h07, mi27h04 and mj29c05, mp66g04, mw75g03, va53b05, vb34h02, 
vc55d07, vc59e05, vc67d03, vc68dl0, vc97h01, vc99c08, vd07h03, vdOScOl, vd09bl2, 
vdl9b02, vd29a04 and vd46d06; 

human SOCS-7 characterised by STS WI30171, EST00939, EST 129 13, yc29b05, 
yp49fl0, ztl0f03 and zx73g04; 

mouse SOCS-7 characterised by mj39a01 and vi52h07; 

mouse SOCS-8 characterised by mj6e09 and vj27a029; 

human SOCS-9 characterised by CSRL-82f2-u, EST114054, yy06b07, yy06g06, 
zr40c09. zr72h01, yx92c08, yx93b08 and hfe0662; 

mouse SOCS-9 characterised by me65d05; 

human SOCS-10 characterised by aa48hlO, zp35h01, zp97hl2, zqOShOl, zr34g05, 
EST73000 and HSDHEI005; 

SUBSTITUTE SHEET (Rule 26) 



wo 98/20023 



PCT/AU97/00729 



-39- 

mouse SOCS-10 characterised by mbl4dl2, mb40f06, mg89bl 1, mq89el2, mp03gl2 
and vh53cll; 

human SOCS-1 1 characterised by zt24h06 and zr43b02; 

5 

human SOCS-13 characterised by EST59161; 

mouse SOCS-13 characterised by ma39a09, me60c05, mi78g05, mklOcl 1, mo48gl2, 
mp94a01 , vb57c07 and vhOTcl 1; and 

10 

human SOCS-14 characterised by nu75e03, vd29hl 1 and vd53g07; 
or a derivative or homologue of the above ESTs characterised by a nucleic acid molecule 
being capable of hybridizing to any of the listed ESTs under low stringency conditions 
at 42^C. 

15 

In another embodiment, the nucleotide sequence encodes the following amino acid sequence: 



X, X2 X3 X4 X5 Xg X7 Xg X9 Xto X]j X12 X,3 X,4X,5 X,6 [XJn Xjy X,8 X19 X20 

X22 X23 [Xj]n X24 X25 X26 X27X28 

20 

wherein: Xj is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P, T or S; 

X4 is L, I, V. M, AorP; 
25 X5 is any amino acid; 

X5 is any amino acid; 

X7 is L, I, V,M,A, F, YorW; 

Xg is C, T or S; 

X9 is R, K or H; 
30 X,o is any amino acid; 

Xj 1 is any amino acid; 
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is L, I, V, M, A or P; 

is any amino acid; 

is any amino acid; 
Xj5 is any amino acid; 
5 X,6 is L, I. V. M, A, P, G, C. T or S; 

[XJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X^is L, I, V, M, Aor P; 
10 X,8 is any amino acid; 

Xi9 is any amino acid; 
X20U I, V, M, Aor P; 
X21 is P; 

X22 is L, I, V, M» A, P or G; 
15 X23 is P or N; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V, M, A or P; 
20 X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; and 

X28 is L, I, V, M, A or P. 



25 The above sequence comparisons are preferably to the whole molecule but may also be to part 
thereof. Preferably, the comparisons are made to a contiguous series of at least about 21 
nucleotides or at least about 5 amino acids. More preferably, the comparisons are made against 
at least about 21 contiguous nucleotides or at least 7 contiguous amino acids. Comparisons may 
also only be made to the SOCS box region or a region encompassing the protein:molecule 

30 interacting region such as the SH2 domain WD-40 repeats and/or ankyrin repeats. 
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Still another embodiment of the present invention contemplates an isolated polypeptide or a 
derivative, homologue, analogue or mimetic thereof comprising a SOCS box in its C-terminal 
region. 

5 Preferably the polypeptide further comprises a protein:molecule interacting domain such as a 
protein:DNA or protein:protein interacting domain. Preferably, this domain is located N-terminal 
of the SOCS box. It is particularly preferred for the proteimmolecule interacting domain to be 
at least one of an SH2 domain, WD-40 repeats and/or ankyrin repeats. 

10 Preferably, the signal transduction is mediated by a cytokine selected from EPO, TPO, G-CSF, 
GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, IL-6, LIF, IL-12, IFNy, TNFa, IL-1 and/or M-CSF, 
Preferred cytokines are IL-6, LIF, OSM, IFN-y or thrombopoietin. 

More preferably, the protein comprises a SOCS box having the amino acid sequence: 



15 




20 



wherein: 



X, is L, I, V, M,Aor P; 

X2 is any amino acid residue; 

X3 is P,TorS; 

X4 is L, I, V, M, Aor P; 

X5 is any amino acid; 

Xg is any amino acid; 

X7 is L, I, V, M, A,F, YorW; 

Xg is C, T or S; 

X9 is R, K or H; 

X,o is any amino acid; 

X„ is any amino acid; 

X,2 is L, I, V, M, A or P; 

Xi3 is any amino acid; 



20 



25 



30 
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is any amino acid; 

is any aniino acid; 

is L, I, V, M, A, P, G, C, T or S; 
[XJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
5 and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 
Xpis L, I, V, M, Aor P; 
X,8 is any amino acid; 
Xi9 is any amino acid; 
10 X2oL,I, V,M, AorP; 

X2, is P; 

X22 is L, I, V, M, A, P or G; 
X23 is P or N; 

[Xjln is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
15 and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24is U I, V, M, Aor P; 

X25 is any amino acid; 

X26 is any amino acid; 
20 Xjvis YorF;and 

X28 is L, I, V, M, A or P. 

Still another embodiment provides an isolated polypeptide or a derivative, homologue, analogue 
or mimetic thereof comprising a sequence of amino acids substantially as set forth in SEQ ID 

25 NO:4 (mSOCSl), SEQ ID NO:6 (mSOCS2), SEQ ID N0:8 (mS0CS3), SEQ ID NO: 10 
(hSOCSl), SEQ ID NO: 12 (rSOCSl), SEQ ID NO: 14 (mS0CS4), SEQ ID NO: 18 (mS0CS5), 
SEQ ID NO:21 (mS0CS6), SEQ ID NO:25 (mS0CS7), SEQ ID NO:29 (mSOCSS). SEQ ID 
NO:36 (hSOCSl 1), SEQ ID N0:41 (mSOCSlS), SEQ ID NO:44 (mSOCSH), SEQ ID NO:46 
(mSOCSlS) and SEQ ID NO:48 (hS0CS15) or an amino acid sequence having at least 15% 

30 similarity to all or a part of the listed sequences. 
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Preferred nucleotide percentage similarities include at least about 20%, at least about 40%, at 
least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90% 
or above such as 93%, 95%, 98% or 99%. 

5 Preferred amino acid similarities include at least about 20%, at least about 30%, at least about 
40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least 
about 90%, at least about 95%, at least about 97% or 98% or above. 

As stated above, similarity may be measured against an entire molecule or a region comprising 
10 at least 21 nucleotides or at least 7 amino acids. Preferably, similarity is measured in a conserved 
region such as SH2 domain, WD-40 repeats, ankyrin repeats or other protein:molecule 
interacting domains or a SOCS box. 

The term "similarity" includes exact identity between sequences or, where the sequence differs, 
15 different amino acids are related to each other at the structural, functional, biochemical and/or 
conformational levels. 

The nucleic acid molecule may be isolated from any animal such as humans, primates, livestock 
animals (e.g. horses, cows, sheep, donkeys, pigs), laboratory test animals (e.g. mice, rats, rabbits, 
20 hamsters, guinea pigs), companion animals (e.g. dogs, cats) or captive wild animals (e.g. deer, 
foxes, kangaroos). 

The terms "derivatives" or its singular form "derivative" whether in relation to a nucleic acid 
molecule or a protein includes parts, mutants, fragments and analogues as ^ell as hybrid or 
25 fusion molecules and glycosylation variants. Particularly useful derivatives comprise single or 
multiple amino acid substitutions, deletions and/or additions to the SOCS amino acid sequence. 

Preferably, the derivatives have functional activity or alternatively act as antagonists or agonists. 
The present invention further extends to homologues of SOCS which include the functionally or 
30 structurally related molecule from different animal species. The present invention also 
encompasses analogues and mimetics. Mimetics include a class of molecule generally but not 
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necessarily having a non-ainino acid structure and which functionally are capable of acting in an 
analogous manner to the protein for which it is a mimic, in this case, a SOCS. Mimetics may 
comprise a carbohydrate, aromatic ring, lipid or other complex chemical structure or may also 
be proteinaceous in composition. Mimetics as well as agonists and antagonists contemplated 
5 herein are conveniently located through systematic searching of environments, such as coral, 
marine and freshwater river beds, flora and microorganisms. This is sometimes referred to as 
natural product screening. Alternatively, libraries of synthetic chemical compounds may be 
screened for potentially useful molecules. 

10 As stated above, the present invention contemplates agonists and antagonists of the SOCS. One 
example of an antagonist is an antisense oligonucleotide sequence. Useful oligonucleotides are 
those which have a nucleotide sequence complementary to at least a portion of the protein- 
coding or "sense" sequence of the nucleotide sequence. These anti-sense nucleotides can be 
used to effect the specific inhibition of gene expression. The antisense approach can cause 

15 inhibition of gene expression apparently by forming an anti-parallel duplex by complementary 
base pairing between the antisense construct and the targeted mRNA, presumably resulting in 
hybridisation arrest of translation. Ribozymes and co-suppression molecules may also be used. 
Antisense and other nucleic acid molecules may first need to be chemically modified to permit 
penetration of cell membranes and/or to increase their serum half life or otherwise make them 

20 more stable for in vivo administration. Antibodies may also act as either antagonists or agonists 
although are more useful in diagnostic applications or in the purification of SOCS proteins. 
Antagonists and agonists may also be identified following natural product screening or 
screening of libraries of chemical compounds or may be derivatives or analogues of the SOCS 
molecules. 

25 

Accordingly, the present invention extends to analogues of the SOCS proteins of the present 
invention. Analogues may be used, for example, in the treatment or prophylaxis of cytokine 
mediated dysfunction such as autoimmunity, immune suppression or hyperactive immunity or 
other condition including but not limited to dysfunctions in the haemopoietic, endocrine, hepatic 
30 and neural systems. Dysfunctions mediated by other signal transducing elements such as 
hormones or endogenous or exogenous molecules, antigens, microbes and microbial products, 
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viruses or components thereof, ions, hormones and parasites are also contemplated by the 
present invention. 

Analogues of the proteins contemplated herein include, but are not limited to, modification to 
5 side chains, incorporating of unnatural amino acids and/or their derivatives during peptide, 
polypeptide or protein synthesis and the use of crosslinkers and other methods which impose 
conformational constraints on the proteinaceous molecule or their analogues. 

Examples of side chain modifications contemplated by the present invention include 
10 modifications of amino groups such as by reductive alkylation by reaction with an aldehyde 
followed by reduction with NaBH4; amidination with methylacetimidate; acylation with acetic 
anhydride; carbamoylation of amino groups with cyanate; trinitrobenzylation of amino groups 
with 2, 4, 6-trinitrobenzene sulphonic acid (TNBS); acylation of amino groups with succinic 
anhydride and tetrahydrophthalic anhydride; and pyridoxylation of lysine with pyridoxal-5- 
15 phosphate followed by reduction with NaBH4. 

The guanidine group of arginine residues may be modified by the formation of heterocyclic 
condensation products with reagents such as 2,3-butanedione, phenylglyoxal and glyoxal. 

20 The carboxyl group may be modified by carbodiimide activation via 0-acylisourea formation 
followed by subsequent derivitisation, for example, to a corresponding amide. 

Sulphydryl groups may be modified by methods such as carboxymethylation with iodoacetic acid 
or iodoacetamide; performic acid oxidation to cysteic acid; formation of a i^iixed disulphides 
25 with other thiol compounds; reaction with maleimide, maleic anhydride or other substituted 
maleunide; formation of mercurial derivatives using 4-chloromercuribenzoate, 4- 
chloromercuriphenylsulphonic acid, phenylmercury chloride, 2-chloromercuri-4-nitrophenol and 
other mercurials; carbamoylation with cyanate at alkaline pH. 

30 Tryptophan residues may be modified by, for example, oxidation with N-bromosuccinimide or 
alkylation of the indole ring with 2-hydroxy-5-nitrobenzyl bromide or sulphenyl halides. 
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Tyrosine residues on the other hand, may be altered by nitration with tetranitromethane to form 
a 3-nitrotyrosine derivative. 

Modification of the imidazole ring of a histidine residue may be accomplished by alkylation with 
5 iodoacetic acid derivatives or N-carbethoxylation with diethylpyrocarbonate. 

Examples of incorporating unnatural amino acids and derivatives during peptide synthesis 
include, but are not limited to, use of norleucine, 4-amino butyric acid, 4-amino-3-hydroxy-5- 
phenylpentanoic acid, 6-aminohexanoic acid, t-butylglycine, norvaline, phenylglycine, ornithine, 
10 sarcosine, 4-amino-3-hydroxy-6-methylheptanoic acid. 2-thienyl alanine and/or D-isomers of 
amino acids. A list of unnatural amino acid, contemplated herein is shown in Table 3. 
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5 



Non-conventional 
amino acid 


Code 


Non-conventional 
amino acid 


Code 


oc-aminobutyric acid 


Abu 


L-N-methylalanine 


Nmala 


a-amino-a-methylbutyrate 


Mgabu 


L-N-methylarginine 


Nmarg 


aminocyclopropane- 


Cpro 


L-N-methylasparagine 


Nmasn 


carboxylate 




L-N-methylaspartic acid 


Nmasp 


aminoisobutyric acid 


Aib 


L-N-methylcysteine 


Nmcys 


aminonorbomyl- 


Norb 


L-N-methylglutamine 


Nmgb 


carboxylate 




L-N-methylglutamic acid 


Nmglu 


cyclohexylalanine 




Chexa L-N-methylhistidine 


Nmhis 


cyclopentylalanine 


Cpen 


T XT 1* 11 

L-N-methyhsoUeucine 


Nmile 


D-alanine 


1 

Dal 


T XT X 1_ 11 

L-N-methylleucme 


Nmleu 


D-arginine 


Darg 


L-N-methyllysine 


Nmlys 


D-aspartic acid 


Dasp 


T XT it_ 1 J.1 ' * 

L-N-methylmethiomne 


Nmmet 


D-cysteine 


Dcys 


L-N-methylnorleucine 


Nmnle 


D-glutamine 


Dgln 


L-N-methylnorvaline 


Nmnva 


D-glutamic acid 


Dglu 


L-N-methylomithine 


Nmom 


D-histidine 


JL/nlS 


L-N-methylphenylalanine 


XT 1_ 

Nmphe 


D-isoleucine 


Dile 


L-N-methylproline 


Nmpro 


D-leucine 


Dleu 


L-N-methylserine 


Nmser 


D-lysine 


Dlys 


L-N-methylthreonine 


Nmthr 


D-methionine 


Dmet 


L-N-methyltryptophan 


Nmtrp 


D-omithine 


Dorn 


L-N-methyltyrosine 


Nmtyr 


D-phenylalanine 


Dphe 


L-N-methylvaline 


Nmval 


D-proIine 


Dpro 


L-N-methylethylglycine 


Nmetg 


D-serine 


Dser 


L-N-methyl-t-butylglycine 


Nmtbug 


D-threonine 


Dthr 


L-norleucine 


Nle 
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D-tryptophan 


Dtrp 


L-norvaline 


Nva 


D-tyrosine 


Dtyr 


a-methyl-aminoisobutyrate 


Maib 


D-valine 


Dval 


a-methyl-y-aminobutyrate 


Mgabu 


D- a-methy lalanine 


Dmala 


a-methylcyclohexylalanine 


Mchexa 


5 D-a-methylarginine 


Dmarg 


a-methylcylcopentylalanine 


Mcpen 


D- o£-methy lasparagine 


Dmasn 


a-methyl-a-napthylalanine 


Manap 


D-a-methylaspartate 


Dmasp 


a-methylpenicillamine 


Mpen 


D-a-methylcysteine 


Dmcys 


N-(4-aminobutyl)glycine 


Nglu 


D-a-methylglutamine 


Dmgln 


N-(2-aminoethyl)glycine 


Naeg 


10 D-a-methylhistidine 


Dmhis 


N-(3-aminopropyl)glycine 


Norn 


D-ot-methylisoleucine 


Dnndle 


N-amino-a-methylbutyrate 


Nmaabu 


D-a-methylleucine 


Dmleu 


a-napthylalanine 


Anap 


D-a-methyllysine 


Dmlys 


N-benzylglycine 


Nphe 


D-a-methylmethionine 


Dmmet 


N-(2-carbamylethyl)glycine 


Nghi 


15 D- a-methy lomithine 


Dmom 


N-(carbamylmethyl)glycine 


Nasn 


D- a-methy Ipheny lalanine 


Dmphe 


N-(2-carboxyethyl)glycine 


Nglu 


D- a-methy Iproline 


Dmpro 


N-(carboxymethyl)glycine 


Nasp 


D-a-methylserine 


Dmser 


N-cyclobutylglycine 


Ncbut 


D-a-methylthreonine 


Dmthr 


N-cycloheptylglycine 


Nchep 


20 D-a-methyltryptophan 


Dmtrp 


N-cyclohexylglycine 


Nchex 


D-a-methyltyrosine 


Dmty 


N-cyclodecylglycine 


Ncdec 


D-a-methylvaline 


Dmval 


N-cylcododecylglycine 


Ncdod 


D-N-methylalanine 


Dnmala 


N-cyclooctylglycine 


Ncoct 


D-N-methylarginine 


Dnmarg 


N-cyclopropylglycine 


Ncpro 


25 D-N-methylasparagine 


Dnmasn 


N-cycloundecylglycine 


Ncund 


D-N-methylaspartate 


Dnmasp 


N-(2,2-dipheny]ethyl)glycine 


Nbhm 


D-N-methylcysteine 


Dnmcys 


N-(3,3-diphenylpropyl)glycine 


Nbhe 


D-N-methylglutamine 


Dnmgln 


N-(3-guanidinopropyl)glycine 


Narg 


D-N-methylglutamate 


Dnmglu 


N-( 1 -hydroxyethy l)glycine 


Nthr 


30 D-N-methylhistidine 


Dnmhis 


N-(hydroxyethyl))gIycine 


Nser 


D-N-methylisoleucine 


Dnmile 


N-(miidazolylethyl))glycine 


Nhis 
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D-N-methylleucine 


Dnmleu 


N-(3-indolylyethyl)glycine 


Nhtrp 


D-N-methyllysine 


Dnmlys 


N-methyl- Y -aminobutyrate 


Nmgabu 


N-methylcyclohexylalanine 


Nmchexa 


D-N-methylmethionine 


Dnmmet 


D-N-methylornithine 


Dnmom 


N-methylcyclopentylalanine 


Nmcpen 


5 N-methylglycine 


Nala 


D-N-methylphenylalanine 


Dnmphe 


N-methylaminoisobutyrate 


Nmaib 


D-N-methylproline 


Dnmpro 


N-( l-methylpropyl)glycine 


NUe 


D-N-methylserine 


Dnmser 


N-(2-methylpropyI)glycine 


Nleu 


D-N-methylthreonine 


Dnmthr 


D-N-methyltryptophan 


Dnmtrp 


N-( 1 -methy lethyl)glycine 


Nval 


10 D-N-methyltyrosine 


Dnmtyr 


N-methyla-napthylalanine 


Nmanap 


D-N-methylvaline 


Dnmval 


N-methylpenicillainine 


Nmpen 


y-aminobutyric acid 


Gabu 


N-(p-hydroxyphenyl)glycine 


Nhtyr 


L-/-butylglycine 


Thug 


N-(thiomethyl)glycine 


Ncys 


L-ethylglycine 


Etg 


penicillamine 


Pen 


15 L-homophenylalanine 


Hphe 


L-a-methylalanine 


Mala 


L-a-methylarginine 


Marg 


L-a-methylasparagine 


Masn 


L-a-methylaspartate 


Masp 


L-a-methyl-r-butylglycine 


Mtbug 


L-a-methylcysteine 


Mcys 


L-methylethylglycine 


Metg 


L- a-methy Iglutamine 


Mgln 


L-a-methylglutamate 


Mglu 


20 L-a-methylhistidine 


Mhis 


L-a-methylhomophenylalanine 


Mhphe 


L-a-methylisoleucine 


Mile 


N-(2-methylthioethyI)glycine 


Nmet 


L-a-methylleucine 


Mleu 


L-a-methyllysine 


Mlys 


L- a-methy Imethionine 


Mmet 


L-a-methylnorleucine 


Mnle 


L-a-methylnorvaline 


Mnva 


L-a-methylomithine 


Mom 


25 L-a-methylphenylalanine 


Mphe 


L-a-methylproline 


Mpro 


L-a-methylserine 


Mser 


L-a-methylthreonine 


Mthr 


L-C£-methyltryptophan 


Mtrp 


L-a-methyltyrosine 


Mtyr 


L-a-methylvaline 


Mval 


L-N-methylhomophenylalanine 


Nmhphe 
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N-(N-(2,2-diphenylethyl) 
carbamylmethyl)glycine 
1 -carboxy- 1 -(2,2-diphenyl 
ethylamino)cyclopropane 



Nnbhm 



N-(N-(3,3-diphenylpropyl) 
carbamylmethyl)glycine 



Nnbhe 



Nmbc 



5 



Crosslinkers can be used, for example, to stabilise 3D conformations, using homo-bifunctional 
crosslinkers such as the bifunctional imido esters having (CH2)n spacer groups with n=l to n=6, 
glutaraldehyde, N-hydroxysuccinimide esters and hetero-bifunctional reagents which usually 

10 contain an amino-reactive moiety such as N-hydroxysuccinimide and another group specific- 
reactive moiety such as maleimido or dithio moiety (SH) or carbodiimide (COOH). In addition, 
peptides can be conformationally constrained by, for example, incorporation of C„ and r^- 
methylamino acids, introduction of double bonds between C„ and Cp atoms of amino acids and 
the formation of cyclic peptides or analogues by introducing covalent bonds such as forming 

15 an amide bond between the N and C termini, between two side chains or between a side chain 
and the N or C terminus. 

These types of modifications may be important to stabilise the cytokines if administered to an 
individual or for use as a diagnostic reagent. 



Other derivatives contemplated by the present invention include a range of glycosylation 
variants from a completely unglycosylated molecule to a modified glycosylated molecule. 
Altered glycosylation patterns may result from expression of recombinant molecules in different 
host cells. 



Another embodiment of the present invention contemplates a method for modulating 
expression of a SOCS protein in a mammal, said method con:q)rising contacting a gene encoding 
a SOCS or a factor/element involved in controlling expression of the SOCS gene with an 
effective amount of a modulator of SOCS expression for a time and under conditions sufficient 
30 to up-regulate or down-regulate or otherwise modulate expression of SOCS. An example of 
a modulator is a cytokine such as IL-6 or other transcription regulators of SOCS expression. 



20 



25 
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Expression includes transcription or translation or both. 

Another aspect of the present invention conten5)Iates a method of modulating activity of SOCS 
in a human, said method comprising administering to said mammal a modulating effective 
5 amount of a molecule for a time and under conditions sufficient to increase or decrease SOCS 
activity. The molecule may be a proteinaceous molecule or a chemical entity and may also be 
a derivative of SOCS or a chemical analogue or truncation mutant of SOCS. 

A further aspect of the present invention provides a method of inducing synthesis of a SOCS 
10 or transcription/translation of a SOCS comprising contacting a cell containing a SOCS gene 
with an effective amount of a cytokine capable of inducing said SOCS for a time and under 
conditions sufficient for said SOCS to be produced. For example, SOCSl may be induced by 
IL-6. 

15 Still a further aspect of the present invention contemplates a method of modulating levels of a 
SOCS protein in a cell said method comprising contacting a cell containing a SOCS gene with 
an effective amount of a modulator of SOCS gene expression or SOCS protein activity for a 
time and under conditions sufficient to modulate levels of said SOCS protein. 

20 Yet a further aspect of the present invention contemplates a method of modulating signal 
transduction in a cell containing a SOCS gene comprising contacting said cell with an effective 
amount of a modulator of SOCS gene expression or SOCS protein activity for a time sufficient 
to modulate signal transduction. 

25 Even yet a further aspect of the present invention contemplates a method of influencing 
interaction between cells wherein at least one cell carries a SOCS gene, said method comprising 
contacting the cell carrying the SOCS gene with an effective amount of a modulator of SOCS 
gene expression or SOCS protein activity for a time sufficient to modulate signal transduction. 

30 As stated above, of the present invention contemplates a range of mimetics or small molecules 
capable of acting as agonists or antagonists of the SOCS. Such molecules may be obtained 
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from natural product screening such as from coral, soil, plants or the ocean or antarctic 
environments. Alternatively, peptide, polypeptide or protein libraries or chemical libraries may 
be readily screened. For example. Ml cells expressing a SOCS do not undergo differentiation 
in the presence of IL-6. This system can be used to screen molecules which permit 
5 differentiation in the presence of IL'6 and a SOCS. A range of test cells may be prepared to 
screen for antagonists and agonists for a range of cytokines. Such molecules are preferably 
small molecules and may be of amino acid origin or of chemical origin. SOCS molecules 
interacting with signalling proteins (eg. JAKS) provide molecular screens to detect molecules 
which interfere or promote this interaction. Once such screening protocol involves natural 
10 product screening. 

Accordingly, the present invention contemplates a pharmaceutical composition comprising 
SOCS or a derivative thereof or a modulator of SOCS expression or SOCS activity and one or 
more pharmaceutically acceptable carriers and/or diluents. These components are referred to 
15 as the "active ingredients". These and other aspects of the present invention apply to any SOCS 
molecules such as but not limited to SOCSl to SOCS 15. 

The pharmaceutical forms containing active ingredients suitable for injectable use include sterile 
aqueous solutions (where water soluble) sterile powders for the extemporaneous preparation 

20 of sterile injectable solutions. It must be stable under the conditions of manufacture and storage 
and must be preserved against the contaminating action of microorganisms such as bacteria and 
fungi. The carrier can be a solvent or dispersion medium containing, for example, water, 
ethanol, polyol (for exan^le, glycerol, propylene glycol and liquid polyethylene glycol, and the 
like), suitable mixtures thereof, and vegetable oils. The proper fluidity can be maintained, for 

25 example, by the use of a coating such as Ucithin, by the maintenance of the required particle size 
in the case of dispersion and by the use of superfactants. The preventions of the action of 
microorganisms can be brought about by various antibacterial and antifungal agents, for 
exanple, parabens, chlorobutanol, phenol, sorbic acid, thirmerosal and the like. In many cases, 
it will be preferable to include isotonic agents, for example, sugars or sodium chloride. 

30 Prolonged absorption of the injectable compositions can be brought about by the use in the 
compositions of agents delaying absorption, for example, aluminum monostearate and gelatin. 
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Sterile injectable solutions are prepared by incorporating the active compounds in the required 
amount in the appropriate solvent with various of the other ingredients enumerated above, as 
required, followed by filtered sterilization. In the case of sterile powders for the preparation 
of sterile injectable solutions, the preferred methods of preparation are vacuum drying and the 
5 freeze-drying technique which yield a powder of the active ingredient plus any additional 
desired ingredient from previously sterile-filtered solution thereof. 

When the active ingredients are suitably protected they may be orally administered, for example, 
with an inert diluent or with an assimilable edible carrier, or it may be enclosed in hard or soft 

10 shell gelatin capsule, or it may be compressed into tablets. For oral therapeutic administration, 
the active compound may be incorporated with excipients and used in the form of ingestible 
tablets, buccal tablets, troches, capsules, elixirs, suspensions, syrups, wafers and the like. Such 
con5)ositions and preparations should contain at least 1 % by weight of active compound. The 
percentage of the compositions and preparations may, of course, be varied and may 

15 conveniently be between about 5 to about 80% of the weight of the unit. The amount of active 
compound in such therapeutically useful compositions in such that a suitable dosage will be 
obtained. Preferred compositions or preparations according to the present invention are 
prepared so that an oral dosage unit form contains between about 0. 1 //g and 2000 mg of active 
compound. 

20 

The tablets, troches, pills, capsules and the like may also contain the components as listed 
hereafter. A binder such as gum, acacia, com starch or gelatin; excipients such as dicalcium 
phosphate; a disintegrating agent such as com starch, potato starch, alginic acid and the like; 
a lubricant such as magnesium stearate; and a sweetening agent such a sucrose, lactose or 

25 saccharin may be added or a flavouring agent such as peppermint, oil of wintergreen or cherry 
flavouring. When the dosage unit form is a capsule, it may contain, in addition to materials of 
the above type, a liquid carrier. Various other materials may be present as coatings or to 
otherwise modify the physical form of the dosage unit. For instance, tablets, pills, or capsules 
may be coated with shellac, sugar or both. A syrup or elixir may contain the active compound, 

30 sucrose as a sweetening agent, methyl and propylparabens as preservatives, a dye and 
flavouring such as cherry or orange flavour. Of course, any material used in preparing any 
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dosage unit form should be pharmaceutically pure and substantially non-toxic in the amounts 
employed. In addition, the active compound(s) may be incorporated into sustained-release 
preparations and formulations. 

5 The present invention also extends to forms suitable for topical application such as creams, 
lotions and gels. 

Pharmaceutically acceptable carriers and/or diluents include any and all solvents, dispersion 
media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents and 
10 the like. The use of such media and agents for pharmaceutical active substances is well known 
in the art. Except insofar as any conventional media or agent is incompatible with the active 
ingredient, use thereof in the therapeutic compositions is contemplated. Supplementary active 
ingredients can also be incorporated into the compositions. 

15 It is especially advantageous to formulate parenteral compositions in dosage unit form for ease 
of administration and uniformity of dosage. Dosage unit forni as used herein refers to physically 
discrete units suited as unitary dosages for the mammalian subjects to be treated; each unit 
containing a predetermined quantity of active material calculated to produce the desired 
therapeutic effect in association with the required pharmaceutical carrier. The specification for 

20 the novel dosage unit forms of the invention are dictated by and directly dependent on (a) the 
unique characteristics of the active material and the particular therapeutic effect to be achieved, 
and (b) the Umitations inherent in the art of compounding such an active material for the 
treatment of disease in living subjects having a diseased condition in which bodily health is 
impaired as herein disclosed in detail. 

25 

The principal active ingredient is compounded for convenient and effective administration in 
effective amounts with a suitable pharmaceutically acceptable carrier in dosage unit form as 
hereinbefore disclosed. A unit dosage form can, for example, contain the principal active 
compound in amounts ranging from 0.5 |ig to about 2000 mg. Expressed in proportions, the 
30 active compound is generally present in from about 0.5 \xg to about 2000 mg/ml of carrier. In 
the case of compositions containing supplementary active ingredients, the dosages are 
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determined by reference to the usual dose and manner of administration of the said ingredients. 
The effective amount may also be conveniently expressed in terms of an amount per kg of body 
weight. For example, from about 0.01 ng to about 10,000 mg/kg body weight may be 
administered. 

5 

The pharmaceutical composition may also comprise genetic molecules such as a vector capable 
of transfecting target cells where the vector carries a nucleic acid molecule capable of 
modulating SOCS expression or SOCS activity. The vector may, for example, be a viral vector. 
In this regard, a range of gene therapies are contemplated by the present invention including 
10 isolating certain cells, genetically manipulating and returning the cell to the same subject or to 
a genetically related or similar subject. 

Still another aspect of the present invention is directed to antibodies to SOCS and its 
derivatives. Such antibodies may be monoclonal or polyclonal and may be selected from 
15 naturally occurring antibodies to SOCS or may be specifically raised to SOCS or derivatives 
thereof. In the case of the latter, SOCS or its derivatives may first need to be associated with 
a carrier molecule. The antibodies and/or recombinant SOCS or its derivatives of the present 
invention are particularly useful as therapeutic or diagnostic agents. 

20 For example, SOCS and its derivatives can be used to screen for naturally occurring antibodies 
to SOCS. These may occur, for example in some autoimmune diseases. Alternatively, specific 
antibodies can be used to screen for SOCS. Techniques for such assays are well known in the 
art and include, for exanple, sandwich assays and ELISA. Knowledge of SOCS levels may be 
inportant for diagnosis of certain cancers or a predisposition to cancers or monitoring cytokine 

25 mediated cellular responsiveness or for monitoring certain therapeutic protocols. 

Antibodies to SOCS of the present invention may be monoclonal or polyclonal. Alternatively, 
fragments of antibodies may be used such as Fab fragments. Furthermore, the present invention 
extends to recombinant and synthetic antibodies and to antibody hybrids. A "synthetic 
30 antibody" is considered herein to include fragments and hybrids of antibodies. The antibodies 
of this aspect of the present invention are particularly useful for immunotherapy and may also 
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be used as a diagnostic tool for assessing apoptosis or monitoring the program of a therapeutic 
regimin. 

For example, specific antibodies can be used to screen for SOCS proteins. The latter would be 
5 inportant, for example, as a means for screening for levels of SOCS in a cell extract or other 
biological fluid or purifying SOCS made by recombinant means from culture supernatant fluid. 
Techniques for the assays contemplated herein are known in the art and include, for example, 
sandwich assays and ELISA. 

10 It is within the scope of this invention to include any second antibodies (monoclonal, polyclonal 
or fragments of antibodies or synthetic antibodies) directed to the first mentioned antibodies 
discussed above. Both the first and second antibodies may be used in detection assays or a first 
antibody may be used with a commercially available anti-immunoglobulin antibody. An 
antibody as contemplated herein includes any antibody specific to any region of SOCS. 

15 

Both polyclonal and monoclonal antibodies are obtainable by immunization with the enzyme 
or protein and either type is utilizable for immunoassays. The methods of obtaining both types 
of sera are well known in the art. Polyclonal sera are less preferred but are relatively easily 
prepared by injection of a suitable laboratory animal with an effective amount of SOCS, or 
20 antigenic parts thereof, collecting serum firom the animal, and isolating specific sera by any of 
the known immunoadsorbent techniques. Although antibodies produced by this method are 
utilizable in virtually any type of immunoassay, they are generally less favoured because of the 
potential heterogeneity of the product. 

25 The use of monoclonal antibodies in an immunoassay is particularly preferred because of the 
ability to produce them in large quantities and the homogeneity of the product. The preparation 
of hybridoma cell lines for monoclonal antibody production derived by fusing an immortal cell 
line and lymphocytes sensitized against the immunogenic preparation can be done by techniques 
which are well known to those who are skilled in the art. 

30 

Another aspect of the present invention contemplates a method for detecting SOCS in a 
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biological sample from a subject said method conprising contacting said biological sample with 
an antibody specific for SOCS or its derivatives or homologues for a time and under conditions 
sufficient for an antibody-SOCS complex to form and then detecting said complex, 

5 The presence of SOCS may be accomplished in a number of ways such as by Western blotting 
and ELIS A procedures. A wide range of immunoassay techniques are available as can be seen 
by reference to US Patent Nos. 4,016,043, 4, 424,279 and 4,018,653. These, of course, include 
both single-site and two-site or "sandwich" assays of the non-competitive types, as well as in 
the traditional competitive binding assays. These assays also include direct binding of a labelled 
10 antibody to a target. 

Sandwich assays are among the most useful and commonly used assays and are favoured for 
use in the present invention. A number of variations of the sandwich assay technique exist, and 
all are intended to be encoir5)assed by the present invention. Briefly, in a typical forward assay, 

1 5 an unlabelled antibody is immobilized on a solid substrate and the sample to be tested brought 
into contact with the bound molecule. After a suitable period of incubation, for a period of time 
sufficient to allow formation of an antibody-antigen complex, a second antibody specific to the 
antigen, labelled with a reporter molecule capable of producing a detectable signal is then added 
and incubated, allowing time sufficient for the formation of another complex of antibody- 

20 antigen-labelled antibody. Any unreacted material is washed away, and the presence of the 
antigen is determined by observation of a signal produced by the reporter molecule. The results 
may either be qualitative, by simple observation of the visible signal, or may be quantitated by 
comparing with a control sample containing known amounts of hapten. Variations on the 
forward assay include a simultaneous assay, in which both sample and labelled antibody are 

25 added simultaneously to the bound antibody. These techniques are well known to those skilled 
in the art, including any minor variations as will be readily apparent. In accordance with the 
present invention the sample is one which might contain SOCS including cell extract, tissue 
biopsy or possibly serum, saliva, mucosal secretions, lymph, tissue fluid and respiratory fluid. 
The sample is, therefore, generally a biological sample comprising biological fluid but also 

30 extends to fermentation fluid and supernatant fluid such as from a cell culture. 
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In the typical forward sandwich assay, a first antibody having specificity for the SOCS or 
antigenic parts thereof, is either covalently or passively bound to a solid surface. The solid 
surface Is typically glass or a polymer, the most commonly used polymers being cellulose, 
polyaciylamide, nylon, polystyrene, polyvinyl chloride or polypropylene. The solid supports 
5 may be in the form of tubes, beads, discs of microplates, or any other surface suitable for 
conducting an immunoassay. The binding processes are well-known in the art and generally 
consist of cross-linking covalendy binding or physically adsorbing, the polymer-antibody 
complex is washed in preparation for the test sample. An aliquot of the sample to be tested is 
then added to the solid phase complex and incubated for a period of time sufficient (e.g. 2-40 
10 minutes or overnight if more convenient) and under suitable conditions (e.g. room temperature 
to 3TC) to allow binding of any subunit present in the antibody. Following the incubation 
period, the antibody subunit solid phase is washed and dried and incubated with a second 
antibody specific for a portion of the hapten. The second antibody is linked to a reporter 
molecule which is used to indicate the binding of the second antibody to the hapten. 

15 

An alternative method involves immobilizing the target molecules in the biological sample and 
then exposing the immobilized target to specific antibody which may or may not be labelled 
with a reporter molecule. Depending on the amount of target and the strength of the reporter 
molecule signal, a bound target may be detectable by direct labelling with the antibody. 
20 Alternatively, a second labelled antibody, specific to the first antibody is exposed to the target- 
first antibody complex to form a target-first antibody-second antibody tertiary complex. The 
complex is detected by the signal emitted by the reporter molecule. 

By "reporter molecule" as used in the present specification, is meant a molecule which, by its 
25 chemical nature, provides an analytically identifiable signal which allows the detection of 
antigen-bound antibody. Detection may be either qualitative or quantitative. The most 
commonly used reporter molecules in this type of assay are either enzymes, fluorophores or 
radionuclide containing molecules (i.e. radioisotopes) and chemiluminescent molecules. 

30 In the case of an enzyme immunoassay, an enzyme is conjugated to the second antibody, 
generally by means of glutaraldehyde or periodate. As will be readily recognized, however, a 
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wide variety of different conjugation techniques exist, which are readily available to the skilled 
artisan. Commonly used enzymes include horseradish peroxidase, glucose oxidase, beta- 
galactosidase and alkaline phosphatase, amongst others. The substrates to be used with the 
specific enzymes are generally chosen for the production, upon hydrolysis by the corresponding 
5 enzyme, of a detectable colour change. Examples of suitable enzymes include alkaline 
phosphatase and peroxidase. It is also possible to employ fluorogenic substrates, which yield 
a fluorescent product rather than the chromogenic substrates noted above. In all cases, the 
enzyme-labelled antibody is added to the first antibody hapten complex, allowed to bind, and 
then the excess reagent is washed away. A solution containing the appropriate substrate is then 
10 added to the complex of antibody-antigen-antibody. The substrate will react with the enzyme 
linked to the second antibody, giving a qualitative visual signal, which may be further 
quantitated, usually spectrophotometrically, to give an indication of the amount of hapten which 
was present in the sample. "Reporter molecule" also extends to use of cell agglutination or 
inhibition of agglutination such as red blood cells on latex beads, and the like. 

15 

Alternately, fluorescent compounds, such as fluorescein and rhodamine, may be chemically 
coupled to antibodies without altering their binding capacity. When activated by illumination 
with Ught of a particular wavelength, the fluorochrome-labelled antibody adsorbs the light 
energy, inducing a state to excitability in the molecule, followed by emission of the light at a 

20 characteristic colour visually detectable with a light microscope. As in die EIA, the fluorescent 
labelled antibody is allowed to bind to the first antibody-hapten complex. After washing off the 
unbound reagent, the remaining tertiary con5)lex is then exposed to the light of the appropriate 
wavelength the fluorescence observed indicates the presence of the hapten of interest. 
Immunofluorescene and EIA techniques are both very well established in the art and are 

25 particularly preferred for the present method. However, other reporter molecules, such as 
radioisotope, chemiluminescent or bioluminescent molecules, may also be employed. 

The present invention also contemplates genetic assays such as involving PGR analysis to detect 
SOCS gene or its derivatives. Alternative methods or methods used in conjunction include 
30 direct nucleotide sequencing or mutation scanning such as single stranded conformation 
polymorphisms analysis (SSCP) as specific oligonucleotide hybridisation, as methods such as 
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direct protein truncation tests. 

Since cytokines are involved in transcription of some SOCS molecules, the detection of SOCS 
provides surrogate markers for cytokines or cytokine activity. This may be useful in assessing 
5 subjects with a range of conditions such as those will autoimmune diseases, for example, 
rheumatoid arthritis, diabetes and stiff man syndrome amongst others. 

The nucleic acid molecules of the present invention may be DNA or RNA. When the nucleic 
acid molecule is in DNA form, it may be genomic DNA or cDNA. RNA forms of the nucleic 
10 acid molecules of the present invention are generally mRNA. 

Although the nucleic acid molecules of the present invention are generally in isolated form, they 
may be integrated into or ligated to or otherwise fused or associated with other genetic 
molecules such as vector molecules and in particular expression vector molecules. Vectors and 
15 expression vectors are generally capable of replication and, if applicable, expression in one or 
both of a prokaryotic cell or a eukaryotic cell. Preferably, prokaryotic cells include E. coli, 
Bacillus sp and Pseudomonas sp. Preferred eukaryotic cells include yeast, fungal, mammalian 
and insect cells. 

20 Accordingly, another aspect of the present invention contemplates a genetic construct 
comprising a vector portion and a mammalian and more particularly a human SOCS gene 
portion, which SOCS gene portion is capable of encoding a SOCS polypeptide or a functional 
or inmiunologically interactive derivative thereof. 

25 Preferably, the SOCS gene portion of the genedc construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said SOCS gene portion 
in an appropriate cell. 

In addition, the SOCS gene portion of the genetic construct may comprise all or part of the 
30 gene ftised to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 
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The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

The present invention also extends to any or all derivatives of SOCS including mutants, part, 
5 fragments, portions, homologues and analogues or their encoding genetic sequence including 
single or multiple nucleotide or amino acid substitutions, additions and/or deletions to the 
naturally occurring nucleotide or amino acid sequence. The present invention also extends to 
mimetics and agonists and antagonists of SOCS. 

10 The SOCS and its genetic sequence of the present invention will be useful in the generation of 
a range of therapeutic and diagnostic reagents and will be especially useful in the detection of 
a cytokine involved in a particular cellular response or a receptor for that cytokine. For 
example, cells expressing SOCS gene such as Ml cells expressing the SOCSl gene, will no 
longer be responsive to a particular cytokine such as, in the case of SOCS 1 , IL-6. Clearly, the 

15 present invention further contemplates cells such as Ml cells expressing any SOCS gene such 
as from SOCS 1 to SOCS 15. Furthermore, the present invention provides the use of molecules 
that regulate or potentiate the ability of therapeutic cytokines. For example, molecules which 
block some SOCS activity, may act to potential therapeutic cytokine activity (eg. G-CSF). 

20 Soluble SOCS polypeptides are also contemplated to be particularly useful in the treatment of 
disease, injury or abnormality involving cytokine mediated cellular responsiveness such as 
hyperimmunity, immunosuppression, allergies, hypertension and the like. 

A further aspect of the present invention contemplates the use of SOCS or its functional 
25 derivatives in the manufacture of a medicament for the treatment of conditions involving 
cytokine mediated cellular responsiveness. 

The present invention further contemplates transgenic mammalian cells expressing a SOCS 
gene. Such cells are usefiil indicator cell lines for assaying for suppression of cytokine function. 
30 One example is Ml cells expressing a SOCS gene. Such cell lines may be useful for screening 
for cytokines or screening molecules such as naturally occurring molecules from plants, coral, 
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microorganisms or bio-organically active soil or water capable of acting as cytokine antagonists 
or agonists. 

The present invention further contemplates hybrids between different SOCS from the same or 
5 different animal species. For example, a hybrid may be formed between all or a functional part 
of mouse SOCS 1 and human SOCS 1 . Alternatively, the hybrid may be between all or part of 
mouse SOCSl and mouse S0CS2. All such hybrids are contemplated herein and are 
particularly useful in developing pleiotropic molecules. 

10 The present invention further contenplates a range of genetic based diagnostic assays screening 
for individuals with defective SOCS genes. Such mutations may result in cell types not being 
responsive to a particular cytokine or resulting in over responsiveness leading to a range of 
conditions. The SOCS genetic sequence can be readily verified using a range of PCR or other 
techniques to determine whether a mutation is resident in the gene. Appropriate gene therapy 

15 or other interventionist therapy may then be adopted. 

The present invention is further described by the following non-limiting Examples. 
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Examples 1-16 relate to SOCSl, SOCS2 and S0CS3 which were identified on the basis of 
activity. Examples 17-24 relate to various aspects of S0CS4 to S0CS15 which were cloned 
initially on the basis of sequence similarity. Examples 25-36 relate to specific aspects of S0CS4 
to S0CS15, respectively. 
5 EXAMPLE 1 

CELL CULTURE AND CYTOKINES 
The Ml cell line was derived from a spontaneously arising leukaemia in SL mice [Ichikawa, 
1969]. Parental Ml cells used in this study have been in passage at the Walter and Eliza Hall 
Institute for Medical Research, Melbourne, Victoria, Australia, for approximately 10 years. Ml 

10 cells were maintained by weekly passage in Dulbecco's modified Eagle's medium (DME) 
containing 10% (v/v) foetal bovine serum (PCS). Recombinant cytokines are generally 
available from commercial sources or were prepared by published methods. Recombinant 
murine LIP was produced in Escherichia coli and purified, as previously described [Gearing, 
1989]. Purified human oncostatin M was purchased from PeproTech Inc (Rocky Hill, NJ, 

15 USA), and purified mouse IFN-y was obtained from Genzyme Diagnostics (Cambridge, MA, 
USA). Recombinant murine thrombopoietin was produced as a FLAGTM-tagged fusion 
protein in CHO cells and then purified. 

EXAMPLE 2 

20 AGAR COLONY ASSAYS 

In order to assay the differentiation of Ml cells in response to cytokines, 300 cells were 
cultured in 35 mm Petri dishes containing 1 ml of DME supplemented with 20%(v/v) fltal calf 
serum (PCS), 0.3%(w/v) agar and 0.1 ml of serial dilutions of IL-6, LIP, GSM, IFN-y. tpo or 
dexamethasone (Sigma Chemical Company, St Louis, MI). After 7 days culture at 37 °C in a 
25 fully humidified atmosphere, containing 10% (v/v) CO2 in air, colonies of Ml cells were 
counted and classified as differentiated if they were composed of dispersed cells or had a corona 
of dispersed cells around a tightly packed centre. 

EXAMPLE 3 

30 GENERATION OF RETROVIRAL LIBRARY 

A cDNA expression library ^yas constructed from the factor-dependent haemopoietic cell line 
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FDC-Pl, essentially as described [Rayner, 1994]. Briefly, cDNA was cloned into the retroviral 
vector pRUFneo and then transfected into an amphotrophic packaging cell line (PAS 17). 
Transiently generated virus was harvested from the cell supernatant at 48 hr posttransfection, 
and used to infect Y2 ecotropic packaging cells, to generate a high titre virus-producing cell 
5 line. 

EXAMPLE 4 
RETROVIRAL INFECTION OF Ml CELLS 

Pools of 10^ infected T2 cells were irradiated (3000 rad) and cocultivated with 10^ Ml cells 
10 in DME supplemented with 10%(v/v) PCS and 4 pg/ml Polybrene, for 2 days at 37°C. To 
select for IL-6-unresponsive clones, retrovirally-infected Ml cells were washed once in DME, 
and cultured at approximately 2x10^* cells/ml in 1 ml agar cultures containing 400 |ig/ml 
geneticin (GibcoBRL, Grand Island, NY) and 100 ng/ml IL-6. The efficiency of infection of 
Ml cells was 1-2%, as estimated by agar plating the infected cells in the presence of geneticin 
15 only. 

EXAMPLE 5 
PCR 

Genomic DNA from retrovirally-infected Ml cells was digested with Sac I and 1 ^g of 
20 phenol/chloroform extracted DNA was then amplified by polymerase chain reaction (PCR). 
Primers used for amplification of cDNA inserts from the integrated retrovirus were GAG3 (5' 
CACGCCGCCCACGTGAAGGC 3' [SEQ ID N0:1]), which corresponds to the vector gag 
sequence approximately 30 bp 5' of the multiple cloning site, and HSVTK (5' 
TTCGCCAATGACAAGACGCT 3' [SEQ ID NO:2]), which corresponds to the pMClneo 
25 sequence approximately 200 bp 3' of the multiple cloning site. The PCR entailed an initial 
denaturation at 94°C for 5 min, 35 cycles of denaturation at 94°C for 1 min, annealing at 56°C 
for 2 min, and extension at 72°C for 3 min» followed by a final 10 niin extension. PCR products 
were gel purified and then iigated into the pGEM-T plasmid (Promega, Madison, WI), and 
sequenced using an ABI PRISM Dye Terminator Cycle Sequencing Kit and a Model 373 
30 Automated DNA Sequencer (Applied Biosystems Inc., Foster City, CA). 
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EXAMPLE 6 
CLONING OF cDNAs 

Independent cDNA clones encoding mouse SOCSl were isolated from a murine thymus cDNA 
library essentially as described (Hilton et al 1994). The nucleotide and predicted amino acid 
5 sequences of mouse SOCSl cDNA were compared to databases using the BLASTN and 
TFASTA algorithms (Pearson and Lipman, 1988; Pearson, 1990; Altshcul et al 1990). 
Oligonucleotides were designed from the ESTs encoding hunnan SOCS 1 and mouse SOC-1 and 
S0CS3 and used to probe commercially available mouse thymus and spleen cDNA libraries. 
Sequencing was performed using an ABI automated sequencer according to the manufacturer's 
10 instructions. 

EXAMPLE 7 

SOUTHERN AND NORTHERN BLOT ANALYSES AND RT-PCR 

^^P-labelled probes were generated using a random decanucleotide labelling kit (Bresatec, 
15 Adelaide, South Australia) from a 600 bp Pst I fragment encoding neomycin phophotransfease 
from the plasmid pPGKneo, 1070 bp fragment of the SOCSl gene obtained by digestion of the 
1.4 kbp PCR product with Xho I, S0CS2, S0CS3, CIS and a 1 .2 kbp fragment of the chicken 
glyceraldehyde 3-phosphate dehydrogenase gene [Dugaiczyk, 1983], 

20 Genomic DNA was isolated from cells using a proteinase K-sodium dodecyl sulfate procedure 
essentially as described. Fifteen micrograms of DNA was digested with either BamH I or Sac 
I, fractionated on a 0.8%(w/v) agarose gel, transferred to GeneScreenPlus membrane (Du Pont 
NEN, Boston MA), prehybridised, hybridised with random-primed ^^P-labelled DNA fragments 
and washed essentially as described [Sambrook, 1989]. 

25 

Total RNA was isolated from cells and tissues using Trizol Reagent, as recommended by the 
manufacturer (GibcoBRL,Grand Island. NY). When required polyA+ mRNA was purified 
essentially as described [Alexander, 1995]. Northern blots were prehybridised, hybridized with 
random-primed 32P-labelled DNA fragments and washed as described [Alexander, 1995]. 

30 

To assess the induction of SOCS genes by IL-6, mice (C57BL6) were injected intravenously 
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with 5 //g IL-6 followed by harvest of the liver at the indicated timepoints after injection. Ml 
cells were cultured in the presence of 20 ng/ml IL-6 and harvested at the indicated times. For 
RT-PCR analysis, bone marrow cells were harvested as described (Metacalf et al, 1995) and 
stimulated for 1 hr at 37°C with 100 ng/ml of a range of cytokines. RT-PCR was performed 
5 on total RNA as described (Metcalf et aU 1995). PGR products were resolved on an agarose 
gel and Southern blots were hybridised with probes specific for each SOCS family member. 
Expression of P-actin was assessed to ensure uniformity of amplification. 

EXAMPLES 

10 DNA CONSTRUCTS AND TRANSFECTION 

A cDNA encoding epitope-tagged SOCSl was generated by subcloning the entire SOCSl 
coding region into the pEF-BOS expression vector [Mizushima, 1990], engineered to encode 
an inframe FLAG epitope downstream of an initiation methionine (pF-SOCSl). Using 
electroporation as described previously [Hilton, 1994], Ml cells expressing the thrombopoietin 

15 receptor (Ml.mpI) were transfected with the 20 ^g of Aat H-digested pF-SOCS 1 expression 
plasmid and 2 //g of a Sea I-digested plasmid in which transcription of a cDNA encoding 
puromycin N-acetyl transferase was driven from the mouse phosphoglycerokinase promoter 
(pPGKPuropA). After 48 hours in culture, transfected cells were selected with 20 |ig/ml 
puromycin (Sigma Chemical Company, St Louis MO), and screened for expression of SOCS 1 

20 by Western blotting, using the M2 anti-FLAG monoclonal antibody according to the 
manafacturer's instructions (Eastman Kodak, Rochester NY). In other experiments Ml cells 
were transfected with only the pF-SOCSl plasmid or a control and selected by their ability to 
grow in agar in the presence of 100 ng/ml of IL-6. 

25 
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EXAMPLE 9 

IMMUNOPRECIPITATION AND WESTERN BLOTTING 

Prior to either immunoprecipitaion or Western blotting, 10^ Ml cells or their derivatives were 
washed twice, resuspended in Iml of DME, and incubated at 37°C for 30 min. The cells were 
5 then stimulated for 4 min at 37°C with either saline or 100 ng/ml IL-6, after which sodium 
vanadate (Sigma Chemical Co., St Louis, MI) was added to a concentration of 1 mM. Cells 
were placed on ice, washed once with saline containing 1 mM sodium vanadate, and then 
solubilised for 5 min on ice with 300 |il 1% (v/v) Triton X-100, 150 mM NaCl, 2 mM EDTA, 
50 mM Tris-HCl pH 7.4, containing Complete protease inhibitors (Boehringer Mannheim, 
10 Mannheim, Germany) and 1 mM sodium vanadate. Lysates were cleared by centrifugation and 
quantitated using a Coomassie Protein Assay Reagent (Pierce, Rockford IL). 

For immunoprecipitations, equal concentrations of protein extracts (1-2 mg) were incubated 
for 1 hr or overnight at 4°C with either 4 pg of anti-gpl30 antibody (M20; Santa Cruz 
15 Biotechnology Inc., Santa Cruz. CA) or 4 |ig of anti-phosphotyrosine antibody (4G10; Upstate 
Biotechnology Inc., Lake Placid NY), and 15 |il packed volume of Protein G Sepharose 
(Pharmacia, Uppsala, Sweden) [Hilton etal, 1996]. Immunoprecipitates were washed twice 
in 1% (v/v) NP40, 150 mM NaCl , 50 mM Tris-HCl pH 8.0, containing Complete protease 
. inhibitors (Boehringer Mannheim, Mannheim, Germany and 1 mM sodium vanadate. The 
20 smvphs were heated for 5 min at 95 ''C in SDS san^Ie buffer (625 mM Tris-HCl pH 6.8, 0.05% 
(w/v) SDS, 0.1% (v/v) glycerol, bromophenol blue, 0.125% (v/v) 2-mercaptoethanol), 
fractionated by SDS-PAGE and immunoblotted as described above. 

For Western blotting, 10 pg of protein from a cellular extract or material from an 
25 immunoprecipitation reaction was loaded onto 4-15% Ready gels (Bio-Rad Laboratories, 
Hercules CA), and resolved by sodium dodecyl sulfate poly aery lamide gel electrophoresis 
(SDS-PAGE). Proteins were transferred to PVDF membrane (Micron Separations Inc., 
Westborough MA) for 1 hr at 100 V. The membranes were probed with the following primary 
antibodies; anti-tyrosine phosphorylated STAT3 (1:1000 dilution; New England Biolabs, 
30 Beverly, MA); anti-STAT3 (C-20; 1:100 dilution; Santa Cruz Biotechnology Inc., Santa Cruz 
CA); anti-gpl30 (M20, 1: 100 dilution; Santa Cruz Biotechnology Inc., Santa Cruz CA); anti- 
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phosphotyrosine (horseradish peroxidase-conjugated RC20, 1:5000 dilution; Transduction 
Laboratories, Lexington KY); anti-tyrosine phosphorylated MAP kinase and anti-MAP kinase 
antibodies (1: 1000 dilution; New England Biolabs, Beverly, MA). Blots were visualised using 
peroxidase-conjugated secondary antibodies and Enhanced Chemiluniinescence (ECL) reagents 
5 according to the manafacturer^s instructions (Pierce, Rockford IL). 

EXAMPLE 10 
ELECTROPHORETIC MOBILITY SHIFT ASSAYS 

Assays were performed as described [Novak, 1995], using the high affinity SIP (c-sis- inducible 
10 factor) binding site m67 [Wakao, 1994]. Protein extracts were prepared from Ml cells 
incubated for 4-10 min at 37°C in 10 ml serum-free DME containing either saline, 100 ng/ml 
IL-6 or 100 ng/ml IFN-y. The binding reactions contained 4-6 pg protein (constant within a 
given experiment), 5 ng ^^P-labelled m67 oligonucleotide, and 800 ng sonicated salmon sperm 
DNA. For certain experiments, protein san5)les were preincubated with an excess of unlabelled 
15 m67 oligonucleotide, or antibodies specific for either STATl (Transduction Laboratories, 
Lexington, KY) or STAT3 (Santa Cruz Biotechnology Inc., Santa Cruz CA), as described 
[Novak, 1995]. 

. Westem blots were performed using anti-tyrosine phosphorylated STATS or anti-STAT3 (New 
20 England Biolabs, Beverly, MA) or anti-gpl30 (Santa Cruz Biotechnology Inc.) as described 
(Nicola et al 1996). EMSA were performed using the m67 oligonucleotide probe, as described 
(Novak era/, 1995). 
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EXAMPLE 11 

EXPRESSION CLONING OF A NOVEL SUPPRESSOR OF 
CYTOKINE SIGNAL TRANSDUCTION 

In order to identify cDNAs capable of suppressing cytokine signal transduction, an expression 
5 cloning approach was adopted. This strategy centred on Ml cells, a monocytic leukaemia cell 
line that differentiates into mature macrophages and ceases proliferation in response to the 
cytokines IL-6, LIF, GSM and IFN-y, and the steroid dexamethasone. Parental Ml cells were 
infected with the RUFneo retrovirus, into which cDNAs from the factor-dependent 
haemopoietic cell line FDC-Pl had been cloned. In this retrovirus, transcription of both the 

10 neomycin resistance gene and the cloned cDNA was driven off the powerful constitutive 
promoter present in the retroviral LTR (Figure 1). When cultured in semi-solid agar, parental 
Ml cells form large tightly packed colonies. Upon stimulation with IL-6. Ml cells undergo 
rapid differentiation, resulting in the formation in agar of only single macrophages or small 
dispersed clusters of cells . Retro virally-infected Ml cells that were unresponsive to IL-6 were 

15 selected in semi-solid agar culture by their ability to form large, tightly packed colonies in the 
presence of IL-6 and geneticin. A single stable IL-6-unresponsive clone, 4A2, was obtained 
after examining 10"^ infected cells. 

A fragment of the neomycin phosphotransferase (neo) gene was used to probe a Southern blot 
20 of genomic DNA from clone 4A2 and this revealed that the cell line was infected with a single 
retrovirus containing a cDNA approximately 1.4 kbp in length (Figure 2). PGR amplification 
using primers from the retroviral vector which flanked the cDNA cloning site enabled recovery 
of a 1.4 kbp cDNA insert, which we have named suppressor of cytokine signalling- 1, or 
SOCSl. This PGR product was used to probe a similar Southern blot of 4A2 genomic DNA 
25 and hybridised to two fragments, one which corresponded to the endogenous SOGS 1 gene and 
the other, which matched the size of the band seen using the neo probe, corresponded to the 
SOCSl cDNA cloned into the integrated retrovirus (Figure 2). The latter was not observed in 
an Ml cell clone infected with a retrovirus containing an irrelevant cDNA. Similarly, Northem 
blot analysis revealed that SOCSl mRNA was abundant in the cell line 4A2, but not in the 
30 control infected Ml cell clone (Figure 2). 
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EXAMPLE 12 

SOCSl, SOCS2, SOCS3 AND CIS DEFINE A NEW FAMILY 
OF SH2-CONTAINING PROTEINS 

5 The SOCSl PCR product was used as a probe to isolate homologous cDNAs from a mouse 
thymus cDNA library. The sequence of the cDNAs proved to be identical to the PCR product, 
suggesting that constitutive or over expression, rather than mutation, of the SOCS 1 protein was 
sufficient for generating an IL-6-unresponsive phenotype. Comparison of the sequence of 
SOCSl cDNA with nucleotide sequence databases revealed that it was present on mouse and 

10 rat genomic DNA clones containing the protamine gene cluster found on mouse chromosome 
16. Closer inspection revealed that the 1.4 kb SOCSl sequence was not homologous to any 
of the protamine genes, but rather represented a previously unidentified open reading frame 
located at the extreme 3' end of these clones (Figure 3). There were no regions of discontinuity 
between the sequences of the SOCSl cDNA and genomic locus, suggesting that SOCSl is 

15 encoded by a single exon. In addition to the genomic clone containing the protamine genes, a 
series of murine and human expressed sequenced tags (ESTs) also revealed large blocks of 
nucleotide sequence identity to mouse SOCSL The sequence information provided by the 
human ESTs allowed the rapid cloning of cDNAs encoding human SOCSl. 

20 The mouse and rat SOCSl gene encodes a 212 amino acid protein whereas the human SOCSl 
gene encodes a 21 1 amino acid protein. Mouse, rat and human SOCS 1 proteins share 95-99% 
amino acid identity (Figure 9). A search of translated nucleic acid databases with the predicted 
amino acid sequence of SOCSl showed that it was most related to a recently cloned cytokine- 
inducible immediate early gene product, CIS, and two classes of ESTs. Full length cDNAs 

25 from the two classes of ESTs were isolated and found to encode proteins of similar length and 
overall structure to SOCSl and CIS. These clones were given the names S0CS2 and SOCS3. 
Each of the four proteins contains a central SH2 domain and a C-terminal region termed the 
SOCS motif. The SOCSl proteins exhibit an extremely high level of amino acid sequence 
similarity (95-99% identity) amongst different species. However, the forms of the SOCSl, 

30 S0CS2, S0CS3 and CIS from the same animal, while clearly defining a new family of SH2- 
containing proteins, exhibited a lower amino acid identity. SOCS2 and CIS exhibit 
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approximately 38% amino acid identity, while the remaining members of the family share 
approximately 25% amino acid identity (Figure 9). The coding region of the genes for SOCSl 
and S0C3 appear to contain no introns while the coding region of the genes for SOCS2 and 
CIS contain one and two introns, respectively. 

5 

The Genbank Accession Numbers for the sequences referred to herein are mouse SOCSl 
cDNA (U88325), human SOCSl cDNA (U88326), mouse SOCS2 cDNA (U88327), mouse 
S0CS3 cDNA (U88328). 

10 EXAMPLE 13 

CONSTITUTIVE EXPRESSION OF SOCSl SUPPRESSES THE 
ACTION OF A RANGE OF CYTOKINES 

To formally establish that the phenotype of the 4A2 cell line was directly related to expression 
of SOCS 1 , and not to unrelated genetic changes which may have occurred independently in 
15 these cells, a cDNA encoding an epitope- tagged version of SOCSl under the control of the 
EFla promoter was transfected into parental Ml cells, and Ml cells expressing the receptor 
for thrombopoietin, c-mpl (Ml.mpl). Transfection of the SOCSl expression vector into both 
cell lines resulted in an increase in the frequency of IL-6 unresponsive Ml cells. 

20 Multiple independent clones of Ml cells expression SOCSl, as detected by Western blot, 
displayed a cytokine-unresponsive phenotype that was indistinguishable from 4A2. Further, if 
transfectants were not maintained in puromycin, expression of SOCS 1 was lost over time and 
cells regained their cytokine responsiveness. In the absence of cytokine, colonies derived from 
4A2 and other SOCSl expressing clones characteristically grew to a smaller size than colones 

25 formed by control Ml cells (Figure 10). 

The effect of constitutive SOCSl expression on the response of Ml cells to a range of 
cytokines was investigated using the 4A2 cell line and a clone of Ml.mpl cells expressing 
SOCSl (Ml.mpl.SOCSl). Unlike parental Ml cells and Ml.mpl cells, the two cell lines 
30 expressing SOCSl continued to proliferate and failed to form differentiated colonies in response 
to either IL-6, LIF, OSM. IFN-y or, in the case of the Ml.mpLSOCSl cell line, thrombopoietin 
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(Figure 4). For both cell lines, however, a normal response to dexamethasone was observed, 
suggesting that SOCSl specifically affected cytokine signal transduction rather than 
differentiation per se. Consistent with these data, while parental Ml cells and Ml.mpl cells 
became large and vacuolated in response to IL-6, 4A2 and Ml.mpl.SOCSl cells showed no 
5 evidence of morphological differentiation in response to IL-6 or other cytokines (Figure 5). 

EXAMPLE 14 

SOCSl INHIBITS A RANGE OF IL-6 SIGNAL TRANSDUCTION 
PROCESSES, INCLUDING STAT3 PHOSPHORYLATION 
10 AND ACTIVATION 

Phosphorylation of the cell surface receptor component gpl30, the cytoplasmic tyrosine kinase 
JAKl and the transcription factor STAT3 is thought to play a central role in IL-6 signal 
transduction. These events were compared in the parental Ml and Ml .mpl cell lines and their 
SOCSl-expressiag counterparts. As expected, gpl30 was phosphorylated rapidly in response 

15 to IL-6 in both parental lines, however, this was reduced five- to ten-fold in the cell lines 
expressing SOCSl (Figure 6). Likewise, STAT3 phosphorylation was also reduced by 
approximately ten-fold in response to IL-6 in those cell lines expressing SOCSl (Figure 6). 
Consistent with a reduction in STAT3 phosphorylation, activation of specific STAT DNA 
binding complexes, as determined by electrophoretic mobility shift assay, was also reduced. 

20 Notably, there was a reduction in the formation of SIF-A (containing STAT3), SIF-B 
(STAT1/STAT3 heterodimer) and SIF-C (containing STATl), the three STAT complexes 
induced in Ml cells stimulated with IL-6 (Figure 7). Similarly, constitutive expression of 
SOCSl also inhibited IFN-y -stimulated formation of p91 homodimers (Figure 7). STAT 
phosphorylation and activation were not the only cytoplasmic processes to be effected by 

25 SOCS 1 expression, as the phosphorylation of other proteins, including she and MAP kinase, 
was reduced to a similar extent (Figure 7). 

EXAMPLE 15 

TRANSCRIPTION OF THE SOCSl GENE IS STIMULATED BY IL-6 
30 IN VITRO AND IN VIVO 

Although SOCSl can inhibit cytokine signal transduction when constitutively expressed in Ml 
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cells, this does not necessarily indicate that SOCSl normally functions to negatively regulate 
an IL-6 response. In order to investigate this possibility the inventors deteraiined whether 
transcription of the SOCSl gene is regulated in the response of Ml cells to IL-6 and, because 
of the critical role IL-6 plays in regulating the acute phase response to injury and infection, the 
5 response of the liver to intravenous injection of 5 mg IL-6. In the absence of IL-6, SOCSl 
mRNA was undetectable in either Ml cells or in the liver. However, for both cell types, a 1.4 
kb SOCSl transcript was induced within 20 to 40 minutes by IL-6 (Figure 8). For Ml cells, 
where the IL-6 was present throughout the experiment, the level of SOCSl mRNA remained 
elevated (Figure 8). In contrast, IL-6 was administered in vivo by a single intravenous injection 
10 and was rapidly cleared from the circulation, resulting in a pulse of IL-6 stimulation to the liver. 
Consistent with this, transient expression of SOCSl mRNA was detectable in the liver, peaking 
approximately 40 minutes after injection and declining to basal levels within 4 hours (Figure 8). 

EXAMPLE 16 

15 REGULATION OF SOCS GENES 

Since CIS was cloned as a cytokine-inducible immediate early gene the inventors examined 
whether SOCSl, S0CS2 and S0CS3 were similarly regulated. The basal pattern of expression 
of the four SOCS genes was examined by Northern blot analysis of mRNA from a variety of 

20 tissues from male and female C57B 1/6 mice (Figure 1 1 A), Constitutive expression of SOCS 1 
was observed in the thymus and to a lesser extend in the spleen and the lung. S0CS2 
expression was restricted primarily to the testis and in some animals the liver and lung; for 
S0CS3 a low level of expression was observed in the lung, spleen and thymus, while CIS 
expression was more widespread, including the testis, heart, lung, kidney and, in some animals, 

25 the liver. 

The inventors sought to determine whether expression of the four SOCS genes was regulated 
by IL-6. Northern blots of mRNA prepared from the livers of untreated and IL-6-injected 
mice, or from unstimulated and IL-6-stimulated Ml cells, were hybridised with labelled 
30 fragments of SOCSl, S0CS2, SOCS3 and CIS cDNAs (Figure 1 IB). Expression of all four 
SOCS genes was increased in the liver following IL-6 injection, however the kinetics of 
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induction appeared to differ. Expression of SOCSl and S0CS3 was transient in the liver, with 
mRNA detectable after 20 minutes of IL-6 injection and declining to basal levels within 4 hours 
for SOCS and 8 hours for S0CS3. Induction of SOCS2 and CIS mRNA in the liver followed 
similar initial kinetics to that of SOCSl, but was maintained at an elevated level for at least 24 
5 hours. A similar induction of SOCS gene mRNA was observed in other organs, notably the 
lung and the spleen. In contrast, in Ml cells, while SOCSl and CIS mRNA were induced by 
IL-6, no induction of either S0CS2 or SOCS3 expression was detected. This result highlights 
cell type-specific differences in the expression of the genes of SOCS family members in 
response to the same cytokine. 

10 

In order to examiae the spectrum of cytokines that was capable of inducing transcription of the 
various members of the SOCS gene family, bone marrow cells were stimulated for an hour with 
a range of cytokines, after which mRNA was extracted and cDNA was synthesised. PCR was 
then used to assess the expression of SOCS 1 , SOCS2, S0CS3 and CIS (Figure 1 IC). In the 

15 absence of stimulation, little or no expression of any of the SOCS genes was detectable in bone 
marrow by PCR. Stimulation of bone marrow cells with a broad array of cytokines appeared 
capable of up regulating mRNA for one or more members of the SOCS family. IFNy, for 
example, induced expression of all four SOCS genes, while erythropoietin, granulocyte colony- 
stimulating factor, granulocyte-macrophage colony stimulating factor and interleukin-3 induced 

20 expression of S0CS2, S0CS3 and CIS. Interestingly, tumor necrosis factor alpha, macrophage 
colony-stimulating factor and interleukin-1, which act through receptors that do not fall into 
the type I cytokine receptor class also appeared capable of inducing expression of S0CS3 and 
CIS, suggesting that SOCS proteins may play a broader role in regulating signal transduction. 

25 As constitutive expression of SOCSl inhibited the response of Ml cells to a range of cytokines, 
the inventors examined whether phosphorylation of the cell surface receptor component gpl30 
and the transcription factor STAT3, which are though to play a central role in IL-6 signal 
transduction, were affected. These events were compared in the parental Ml and Ml.mpl cell 
lines and their SOCSl -expressing counterparts. As expected, gpl30 was phyosphorylated 

30 rapidly in response to IL-6 in both parental lines, however, this was reduced in the cell lines 
expressing SOCSl (Figure 12A). Likewise, STAT3 phosphorylation was also reduced in 
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response to IL-6 in those cell lines expressing SOCSl (Figure 12A). Consistent with a 
reduction in STATS phosphorylation, activation of specific STAT/DNA binding complexes, as 
determined by electrophoretic mobility shift assay, was also reduced. Notably, there was a 
failure to form SIF-A (containing STATS) and SIF-B(ST ATI/STATS heterodimer), the major 
5 STAT complexes induced in Ml cells stimulated with IL-6 (Figure 12B). Similarly, constitutive 
expression of SOCSl also inhibited IFNy-stimulating formation of SIF-C (STATl homodimer; 
Figure 12B). These experiments are consistent with the proposal that SOCSl inhibits signal 
transduction upstream of receptor and STAT phosphorylation, potentially at the level of the 
JAK kinases. 

10 

The ability of SOCS 1 to inhibit signal transduction and ultimately the biological response to 
cytokines suggest that, like the SH2-containing phosphatase SHP-1 [Ihle et al 1994; Yi et al, 
199S], the SOCS proteins may play a central role in controlling the intensity and/or duration 
of a ceirs response to a diverse range of extracellular stimuli by suppressing the signal 

15 transduction process. The evidence provided here indicates that the SOCS family acts in a 
classical negative feedback loop for cytokine signal transduction. Like other genes such as 
OSM, expression of genes encoding the SOCS proteins is induced by cytokines through the 
activation of STATs. Once expressed, it is proposed that the SOCS proteins inhibit the activity 
of JAKs and so reduce the phosphorylation of receptors and STATs, thereby suppressing signal 

20 transduction and any ensuing biological response. Importantly, inhibition of STAT activation 
will, over time, lead to a reduction in SOCS gene expression, allowing cells to regain 
responsiveness to cytokines. 

EXAMPLE 17 

25 DATABASE SEARCHES 

The NCBI genetic sequence database (Genbank), which encompasses the major database of 
expressed sequence tags (ESTs) and TIGR database of human expressed sequence tags, were 
searched for sequences with similarity to a concensus SOCS box sequence using the TFASTA 
30 and MOTIF/PATTERN algorithms [Pearson, 1990; Cockwell and Giles, 1989]. Using the 
software package SRS [Etzold et al 1996], ESTs that exhibited similarity to the SOCS box 
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(and their partners derived from sequencing the other end of cDNAs) were retrieved and 
assembled into contigs using Autoassembler (Applied Biosystems, Foster City, CA). Consensus 
nucleotide sequences derived from overlapping ESTs were then used to search the various 
databases using BLASTN [Altschul et al, 1990]. Again, positive ESTs were retrieved and 
5 added to the contig. This process was repeated until no additional ESTs could be recovered. 
Rnal consensus nucleotide sequences were then translated using Sequence Navigator (Applied 
Biosystems, Foster City, CA). 

The ESTs encoding the new SOCS proteins are as follows: human SOCS4 (ESTS 1149, 
10 EST180909, EST182619, ya99H09, ye70co4, yh53c09, yh77gl 1, yh87h05, yi45h07, yj04e06. 
yql2h06, yq56a06. yq60e02, yq92g03, yq97h06, yr90fi)l, yt69c03, yv30a08, yv55f07, 
yv57h09, yv87h02, yv98ell, yw68dl0, yw82a03, yx08a07, yx72h06, yx76b09, yy37h08, 
yy66b02, za81fl)8, zbl8ff)7, zc06e08, zdl4g06, zd51hl2, zd52b09, ze25gl 1, ze69f02, zf54f03, 
zh96e07, zv66hl2, zs83a08 and zs83g08). mouse SOCS-4 (mc65f04, mf42e06, mplOclO, 
15 nir81g09, and mtl9hl2). human SOCS-5 (EST15B103, EST15B105, EST27530 and 
zfSOfOl). mouse SOCS-5 (mc55a01, mh98fD9, my26hl2 and ve24e06). human SOCS-6 
(yf51e08, yf93a09, yg05fl2, yg41fl)4, yg45c02, yhl IflO, yhl3b05, zc35al2, ze02h08, zl09a03, 
zl69el0, zn39d08 and zo39e06). mouse SOCS-6 (mc04c05, md48a03, mf31d03, mh26b07, 
. mh78ell, mh88h09, mh94h07, nii27h04 and mj29c05, mp66g04, mw75g03, va53b05, 
20 vb34h02, vc55d07, vc59e05, vc67d03, vc68dl0, vc97h01, vc99c08, vd07h03, vd08c01, 
vd09bl2, vdl9b02, vd29a04 and vd46d06). human SOCS-7 (STS WI30171, EST00939, 
EST12913, yc29b05, yp49fl0, ztl0f03 and zx73g04). mouse SOCS-7 (mj39a01 and 
vi52h07). mouse SOCS-8 (mj6e09 and vj27a029). human SOCS-9 (CSRL-82f2-u, 
ESTl 14054, yy06b07, yy06g06, zr40c09, zr72h01, yx92c08, yx93b08 and hfe0662). mouse 
25 SOCS-9 (me65d05). human SOCS-10 (aa48hl0, zp35h01, zp97hl2, zqOShOl, zr34g05, 
EST73000 and HSDHEI005). mouse SOCS-10 (mbl4dl2, mb40f06, mg89bll, mq89el2, 
mp03gl2 and vh53cll). human SOCS-11 (zt24h06 and zr43b02). human SOCS-IS 
(EST59161). mouse SOCS-13 (ma39a09, me60c05, mi78g05, mklOcll, mo48gl2, mp94a01, 
vb57c07 and vh07cll). human SOCS-14 (mi75e03, vd29hll and vd53g07). 

30 
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EXAMPLE 18 
cDNA CLONING 

Based on the concensus sequences derived from overlapping ESTs. oligonucleotides were 
5 designed that were specific to various members of the SOCS family. As described above, 
oligonucleotides were labelled and used to screen commerically available genomic and cDNA 
libraries cloned with X bacteriophage. Genomic and/or cDNA clones covering the entire coding 
region of mouse S0CS4, mouse S0CS5 and mouse S0CS6 were isolated. The entire gene for 
SOCS 15 is on the hunnan 12pl3 BAC (Genbank Accession Number HSU47924) and the mouse 
10 chromosome 6 BAC (Genbank Accession Number AC002393). Partial cDNAs for mouse 
SOCS7. S0CS9, SOCSIO, SOCSl 1, SOCS12, S0CS13 and SOCS14 were also isolated. 

EXAMPLE 19 
NORTHERN BLOTS AND rtPCR 

15 

Northern blots were performed as described above. The sources of hybridisation probes were 
as follows; (i) the entire coding region of the mouse SOCSl cDNA, (ii) a 1059 bp PCR product 
derived from coding region of SOCS5 upstream of the SH2 domain, (iii) the entire coding 
. region of the mouse S0CS6 cDNA, (iv) a 790 bp PCR product derived from the coding region 
20 of a partial S0CS7 cDNA and (v) a 1200 bp Pst I fragment of the chicken glyceraldehyde 3- 
phosphate dehydrogenase (GAPDH) cDNA, 

EXAMPLE 20 
ADDITIONAL MEMBERS OF SOCS FAMILY 

25 

SOCSl, S0CS2 and SOCS3 are members of the SOCS protein family identified in Examples 
1-16. Each contains a central SH2 domain and a conserved motif at the C-terminus, named the 
SOCS box. In order to isolate further members of this protein family, various DNA databases 
were searched with the amino acid sequence corresponding to conserved residues of the SOCS 
30 box. This search revealed the presence of human and mouse ESTs encoding twelve further 
members of the SOCS protein family (Figure 13). Using this sequence information cDNAs 



SUBSTITUTE SHEET (Rule 26\ 



wo 98/20023 



PCT/AU97/00729 



-78- 

encoding SOCS4, S0CS5, S0CS6, SOCS7, S0CS9, SOCSIO, SOCSl 1. S0CS12, S0CS13, 
S0CS14 and SOCSl 5 have been isolated. Further analysis of contigs derived from ESTs and 
cDNAs revealed that the SOCS proteins could be placed into three groups according to their 
predicted structure N-terminal of the SOCS box. The three groups are those with (i) SH2 
5 domains, (ii) WD-40 repeats and (iii) ankyrin repeats. 



10 

EXAMPLE 21 
SOCS PROTEIN WITH SH2 DOMAINS 

Eight SOCS proteins with SH2 domains have been identified. These include SOCSl, S0CS2 
15 and SOCS3, SOCS5, S0CS9, SOCSl 1 and SOCS14 (Figure 13). Full length cDNAs were 
isolated for mouse SOCS5 and SOCS 14 and partial clones encoding mouse S0CS9 and 
SOCS 14. Analysis of primary amino acid sequence and genomic structure suggest that pairs 
of these proteins (SOCSl and SOCS3, S0CS2 and QS, S0CS5 and SOCS 14 and SOCS9 and 
SOCSll) are most closely related (Figure 13). Indeed, the SH2 domains of S0CS5 and 
20 S0CS14 are almost identical (Figure 13B), and unlike CIS, SOCSl, SOCS2 and S0CS3, 
S0CS5 and SOCS 14 have an extensive, though less well conserved, N-terminal region 
preceding their SH2 domains (Figure 13 A). 

EXAMPLE 22 

25 SOCS PROTEINS WITH WD-40 REPEATS 

Four SOCS proteins with WD-40 repeats were identified. As with the SOCS proteins with 
SH2 domains, pairs of these proteins appeared to be closely related. Full length cDNAs of 
mouse S0CS4 and S0CS6 were isolated and shown to encode proteins containing eight WD- 
30 40 repeats N-terminal of the SOCS box (Figure 13) and S0CS4 and SOCS6 share 65% amino 
acid similarity. SOCS 15 was recognised as an open reading frame upon sequencing B ACs from 
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human chromosome 12pl3 and the syntenic region of mouse chromosome 6 [Ansari-Lari et al, 
1997]. In the human, chimp and mouse, S0CS15 is encoded by a gene with two coding exons 
that lies within a few hundred base pairs of the 3' end of the triose phosphate isomerase (TPI) 
gene, but which is encoded on the opposite strand to TPI (9). In addition to a C-terminal 
5 SOCS box, the S0CS15 protein contains four WD-40 repeats. Interestingly, within the EST 
databases, there is a sequence of a nematode, an insect and a fish relative of SOCS 15. SOCS 1 5 
appears most closely related to SOCS 13. 

EXAMPLE 23 

10 SOCS PROTEINS WITH ANKYRIN REPEATS 

Three SOCS proteins with ankyrin repeats were identified. Analysis of partial cDNAs of mouse 
S0CS7, SOCS 10 and SOCS 12 demonstrated the presence of multiple ankyrin repeats. 

15 EXAMPLE 24 

EXPRESSION PATTERN OF SOCS PROTEINS 

The expression of mRNA from representative members of each class of SOCS proteins - 
SOCSl and SOCS5 from the SH2 domain group, S0CS6 from the WD-40 repeat group and 
20 S0CS7 from the ankyrin repeat group was examined. As shown above, SOCSl mRNA is 
found in abundance in the thymus and at lower levels in other adult tissues. 

Since transcription of the SOCSl gene is induced by cytokines, the inventors sought to 
determine whether levels of S0CS5, S0CS6 and S0CS7 mRNA increased upon cytokine 
25 stimulation. In the livers of mice injected with IL-6, SOCSl mRNA is detectable after 20 min 
and decreases to background levels within 2 hours. In contrast, the kinetics of SOCS5 mRNA 
expression are quite different, being only detectable 12 to 24 hours after IL-6 injection. S0CS6 
mRNA appears to be expressed constitutively while S0CS7 mRNA was not detected in the 
liver either before injection of IL-6 or at any time after injection. 

30 

Expression of these genes was also examined after cytokine stimulation of the factor-dependent 
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cell line FDCP-1 engineered to express bcl-w. Again, while SOCS6 niRNA was expressed 
constitutively. 

EXAMPLE 25 
5 SOCS4 

Mouse and hunian SOCS4 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human S0CS4 cDNAs are 
tabulated below (Tables 4.1 and 4.2). Using sequence information derived from mouse ESTs 

10 several oligonucleotides were designed and used to screen, in the conventional manner, a mouse 
thymus cDNA hbrary cloned into ^-bacteriophage. Two cDNAs encoding mouse S0CS4 
were isolated and sequenced in their entirety (Figure 15) and shown to overlap the mouse 
ESTs identified in the database (Table 4. 1 and Figure 17). These cDNAs include a region of 
5' untranslated region, the entire mouse S0CS4 coding region and a region of 3' untranslated 

15 region (Figure 17). Analysis of the sequence confirms that the S0CS4 cDNA encodes a 
SOCS Box at its C-termdnus and a series of 8 WD-40 repeats before the SOCS Box (Figures 
17 and 16). The relationship of the two sequence contigs of human SOCS4 (h4.1 and h4.2) 
to the experimentally determined mouse S0CS4 cDNA sequence is shown in Figure 17. The 
nucleotide sequence of the two human contigs is listed in Figure 18. 

20 

SEQ ID NO: 13 and 14 represent the nucleotide sequence of murine S0CS4 and the 
corresponding amino acid sequence. SEQ ID NOs: 15 and 16 are SOCS4 cDNA human 
contigs h4.1 and h4.2, respectively. 

25 EXAMPLE 26 

SOCS5 

Mouse and human S0CS5 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human SOCS5 cDNAs are 
30 tabulated below (Tables 5. 1 and 5.2). Using sequence information derived from mouse and 
human ESTs, several oligonucleotides were designed and used to screen, in the conventional 



SUBSTITUTE SHEET Olule 26^ 



wo 98/20023 



PCT/AU97/00729 



-81 - 

manner, a mouse thymus cDNA library, a mouse genomic DNA library and a human thymus 
cDNA library cloned into ^.-bacteriophage . A single genomic DNA clone (57-2) and (5-3-2) 
cDNA clone encoding mouse S0CS5 were isolated and sequenced in their entirety and shown 
to overlap with the mouse ESTs identified in the database (Figures 19 and 20A). The entire 
5 coding region, in addition to a region of 5' and 3' untranslated regions of mouse S0CS5 
appears to be encoded on a single exon (Figure 19). Analysis of the sequence (Figure 20) 
confirms that S0CS5 genomic and cDNA clones encode a protein with a SOCS box at its C- 
terminus in addition to an SH2 domain (Figure 19 and 20B). The relationship of the human 
SOCS5 contig (h5.1; Figure 21) derived from analysis of cDNA clone 5-94-2 and the human 
10 SOCS5 ESTs (Table 5.2) to the mouse SOCS5 DNA sequence is shown in Figure 19. The 
nucleotide sequence and corresponding amino acid sequence of murine SOCS5 are shown in 
SEQ ID NOs: 17 and 18, respectively. The human S0CS5 nucleotide sequence is shown in 
SEQ ID NO: 19. 

15 EXAMPLE 27 

SOCS6 

Mouse and human S0CS6 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human SOCS6 cDNAs are 

20 tabulated below (Tables 6.1 and 6.2), Using sequence information derived from mouse ESTs, 
several oligonucleotides were designed and use to screen, in the conventional manner, a mouse 
thymus cDNA library. Eight cDNA clones (6-1 A, 6-2A, 6-5B, 6-4N, 6-18, 6-29, 6-3N, 6-5N) 
cDNA clone encoding mouse S0CS6 were isolated and sequenced in their entirety and shown 
to overlap with the mouse ESTs identified in the database (Figures 22 and 23A). Analysis of 

25 the sequence (Figure 23) confirms that the mouse SOCS 6 cDNA clones encode a protein with 
a SOCS box at its C-terminus in addition to a eight WD-40 repeats (Figures 22 and 23B). The 
relationship of the human SOCS-6 contigs (h6.1 and h6.2 ; Figure 24) derived from analysis of 
human SOCS6 ESTs (Table 6.2) to the mouse S0CS6 DNA sequence is shown in Figure 22. 
The nucleotide and corresponding amino acid sequences of murine S0CS6 are shown in SEQ 

30 ID NOs: 20 and 21, respectively. S0CS6 human contigs h6. 1 and h6.2 are shown in SEQ ID 
NOs: 22 and 23, respectively. 
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EXAMPLE 28 
SOCS7 

Mouse and human S0CS7 were recognized through searching EST databases using the SOCS 
5 box consensus (Figure 13). Those ESTs derived from mouse and human SOCS-7 cDNAs are 
tabulated below (Tables 7.1 and 7.2). Using sequence information derived from mouse ESTs, 
several oligonucleotides were designed and use to screen, in the conventional manner, a mouse 
thymus cDNA library. One cDNA clone (74-lOA-l 1) cDNA clone encoding mouse S0CS7 
was isolated and sequenced in its entirety and shown to overlap with the mouse ESTs identified 

10 in the database (Figures 25 and 26A). Analysis of the sequence (Figure 26) suggests that 
mouse S0CS7 encodes a protein with a SOCS box at its C-terminus, in addition to several 
ankyrin repeats (Figure 25 and 26B). The relationship of the human S0CS7 contigs (h7. 1 and 
h7.2 ; Figure 27) derived from analysis of human S0CS7 ESTs (Table 7.2) to the mouse 
SOCS7 DNA sequence is shown in Figure 25. The nucleotide and corresponding amino acid 

15 sequences of murine S0CS7 are shown in SEQ ID NOs: 24 and 25, respectively. The 
nucleotide sequence of S0CS7 human contigs h7.1 and h7.2 are shown in SEQ ID NOs: 26 and 
27, respectively. 

EXAMPLE 29 
20 S0CS8 

ESTs derived from mouse S0CS8 cDNAs are tabulated below (Table 8.1). As described for 
other members of the SOCS family, it is possible to isolate cDNAs for mouse S0CS8 using 
sequence information derived from mouse ESTs. The relationship of the ESTs to the predicted 
25 coding region of SOCS8 is shown in Figure 28. With the nucleotide sequence obtained from 
the ESTs shown in Figure 29A and the partial amino acid sequence of S0CS8 shown in Figure 
29B. The nucleotide sequence and corresponding amino acid sequences for murine S0CS8 are 
shown in SEQ ID NOs:28 and 29, respectively. 



30 
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Mouse and human SOCS-9 were recognized through searching EST databases using the SOCS 
box consensus (Figure 13). Those ESTs derived from mouse and human S0CS9 cDNAs are 
tabulated below (Tables 9.1 and 9.2). The relationship of the mouse S0CS9 contigs (m9.1; 
Figure 9.2) derived from analysis of the mouse S0CS9 EST (Table 9.1) to the human SOCS-9 
5 DNA contig (h9.1; Figure 32) derived from analysis of human S0CS9 ESTs (Table 9.2) is 
shown in Figure 3 1 . Analysis of the sequence (Figure 32) indicates that the human S0CS9 
cDNA encodes a protein with a SOCS box at its C-terminus, in addition to an SH2 domain 
(Figure 30). The nucleotide sequence of muring S0CS9 cDNA is shown in SEQ ID NO:30. 
The nucleotide sequence of human S0CS9 cDNA is shown in SEQ ID NO:31. 

10 

EXAMPLE 31 
SOCSIO 

Mouse and human SOCSIO were recognized through searching EST databases using the SOCS 
15 box consensus (Figure 13). Those ESTs derived from mouse and human SOCS 10 cDNAs are 
tabulated below (Table 10.1 and 10.2). Using sequence information derived from mouse ESTs, 
several oligonucleotides were designed and use to screen, in the conventional manner, a mouse 
thymus cDNA library. Four cDNA clones (10-9, 10-12. 10-23 and 10-24) encoding mouse 
SOCSIO were isolated, sequenced in their entirety and shown to overlap with the mouse and 
20 human ESTs identified in the database (Figures 33 and 34). Analysis of the sequence (Figure 
34) indicates that the mouse SOCSIO cDNA clone is not full length but that it does encode a 
protein with a SOCS box at its C-terminus, in addition to several ankyrin repeats (Figure 33). 
The relationship of the human SOCSIO contigs (hlO.l and hlO.2 ; Figure 35) derived from 
analysis of human SOCSIO ESTs (Table 10.2) to the mouse SOCSIO DNA sequence is shown 
25 in Figure 33. Comparison of mouse cDNA clones and ESTs with human ESTs suggests that 
the 3' untranslated regions of mouse and human SOCSIO differ significantly. The nucleotide 
sequence of murine SOCSIO is shown in SEQ ID NO:32 and the nucleotide sequence of 
SOCSIO human contigs hlO.l and hl0,2 are shown in SEQ ID NOs:33 and 34, respectively. 

30 EXAMPLE 32 

SOCSll 
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Human SOCSll were recognized through searching EST databases using the SOCS box 
consensus (Figure 13). Those ESTs derived from human SOCSl 1 cDNAs are tabulated below 
(Tablell.l and 11.2). The relationship ofthe human SOCS 11 contigs (hi 1.1; Figure 36A, B), 
derived from analysis ESTs (Table 1 1.2) to the predicted encoded protein, is shown in Figure 
5 37. Analysis of the sequence indicates that the human SOCS 1 1 cDNA encodes a protein with 
a SOCS box at its C-terminus, in addition to an SH2 domain (Figure 37 and 36B). The 
nucleotide sequence and corresponding amino acid sequence of human SOCS 1 1 are represented 
in SEQ ED NOs:35 and 36, respectively. 

10 EXAMPLE 33 

SOCS12 

Mouse and human SOCS- 12 were recognized through searching EST databases using the 
SOCS box consensus (Figure 13). Those ESTs derived from mouse and human SOCS 12 

15 cDNAs are tabulated below (Tables 12.1 and 12.2). Using sequence information derived from 
mouse ESTs, several oligonucleotides were designed and use to screen, in the conventional 
manner, a mouse thymus cDNA library. Four cDNA clones (10-9, 10-12, 10-23 and 10-24) 
encoding mouse SOCS 12 were isolated, sequenced in their entirety and shown to overlap with 
the mouse and human ESTs identified in the database (Figures 38 and 39). Analysis of the 

20 sequence (Figure 39 and 40) indicates that the SOCS 12 cDNA clone encodes a protein with 
a SOCS box at its C-terminus, in addition to several ankyrin repeats (Figure 38). The 
relationship of the human SOCS 12 contigs (hi 2.1 and hl2.2 ; Figure 40) derived from analysis 
of human SOCS12 ESTs (Table 12.2) to the mouse SOCS12 DNA sequence is shown in Figure 
38. Comparison of mouse cDNA clones and ESTs with human ESTs suggests that the 3' 

25 untranslated regions of mouse and human SOCS 12 differ significantly. The nucleotide 
sequence of SOCS 1 2 is shown in SEQ ID NO:37. The nucleotide sequence of human SOCS 1 2 
contigs hl2.1 and hl2.2 are shown in SEQ ID NOs:38 and 39, respectively, 

EXAMPLE 34 
30 S0CS13 
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Mouse and human SOCS-13 were recognized through searching EST databases using the 
SOCS box consensus (Figure 13). Those ESTs derived from mouse and human S0CS13 
cDNAs are tabulated below (Tables 13.1 and 13.2). Using sequence information derived from 
mouse ESTs, several oligonucleotides were designed and use to screen, in the conventional 
5 manner, a mouse thymus and a nnouse embryo cDNA library. Three cDNA clones (62- 1 , 62-6- 
7 and 62-14) encoding mouse SOCS 13 were isolated, sequenced in their entirety and shown 
to overlap with the mouse ESTs identified in the database (Figure 4 1 and 42A). Analysis of the 
sequence (Figure 42) indicates that the mouse SOCS 13 cDNA encodes a protein with a SOCS 
box at its C-terminus, in addition to a potential WD-40 repeat (Figure 41 and 42B). The 
10 relationship of the human SOCS13 contigs (hl3.1 and hl3.2 ; Figure 43) derived from analysis 
of human SOCS13 ESTs (Table 13.2) to the mouse SOCS13 DNA sequence is shown in Figure 
41. The nucleotide sequence and corresponding amino acid sequence of murine SOCS 13 and 
shown in SEQ ID NOs:40 and 41, respectively. The nucleotide sequence of human SOCS 13 
contig hl3.1 is shown in SEQ ID NO:42. 

15 

EXAMPLE 35 
SOCS14 

Mouse and human SOCS-14 were recognized through searching EST databases using the 
20 SOCS box consensus (Figure 13). Those ESTs derived from mouse and human SOCS14 
cDNAs are tabulated below (Tables 14.1 and 14.2). Using sequence information derived from 
mouse and human ESTs, several oligonucleotides were designed and use to screen, in the 
conventional manner, a mouse thymus cDNA library, a mouse genomic DNA library and a 
human thymus cDNA library cloned into A,-bacteriophage . A single genomic DNA clone (57- 
25 2) and (5-3-2) cDNA clone encoding mouse SOCS 14 were isolated and sequenced in their 
entirety and shown to overlap with the mouse ESTs identified in the database (Figures 44 and 
45 A). The entire coding region, in addition to a region of 5' and 3' untranslated regions, of 
mouse SOCS 14 appears to be encoded on a single exon (Figure 44). Analysis of the sequence 
(Figure 45) confirms that SOCS 14 genomic and cDNA clones encode a protein with a SOCS 
30 box at its C-terminus in addition to an SH2 domain (Figure 44 and 45B). The relationship of 
the human SOCS 14 contig (hl4.1; Figure 14.3) derived from analysis of cDNA clone 5-94-2 
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and the human SOCS14 ESTs (Table 14.2) to the mouse SOCS14 DNA sequence is shown 
in Figure 44. 

The nucleotide sequence and corresponding amino acid sequence of murine S0CS14 are 
5 shown in SEQ ID NOs: 43 and 44, respectively. 
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EXAMPLE 36 
SOCS15 

Mouse and human S0CS15 were recognized through searching DNA databases using the 
5 SOCS box consensus (Figure 13). Those ESTs derived from mouse and human SOCS15 
cDNAs are tabulated below (Tables 15.1 and 15.2), as are a mouse and human BAG that 
contain the entire mouse and human SOCS- 15 genes. Using sequence information derived from 
the ESTs and the B AGs it is possible to predict the entire amino acid sequence of SOGS 1 5 and 
as described for the other SOGS genes it is feasible to design specific oligonucleotide probes 

10 to allow cDNAs to be isolated. The relationship of the BAGs to the ESTs is shown in Figure 
46 and the nucleotide and predicted amino acid sequence of the SOGS-15. derived from the 
mouse and human BAGs is shown in Figures 47 and 48. The nucleotide sequence and 
corresponding amino acid sequence of murine SOGS 15 are shown in SEQ ID NOs:46 and 47, 
respectively. The nucleotide and corresponding amino acid sequence of human SOGS 15 are 

15 shown in SEQ ID NO:48 and 49, respectively. 

EXAMPLE 37 
SOCS INTERACTION WITH JAK2 KINASE 

20 These Examples show interaction between SOGS and JAK2 kinase. Interaction is mediated via 
the SH2 domain of SOCS 1, 2, 3 and CIS. The interaction resulted in inhibition of JAK2 kinase 
activity by SOCSl (Figure 49). General interaction between JAK2 and SOGSl, 2, 3, and CIS 
is shown in Figure 50. 

25 The following methods are employed: 

Immunoprecipitation: Cos 6 cells were transiently transfected by electroporation and cultured 
for 48 hours. Cells were then lysed on ice in lysis buffer (50 noM Tris/HCL, pH 7.5, 150 mM 
NaCl, 1% v/v Triton-X-100, 1 mM EDTA, 1 mM Naf, 1 mM NajVOJ with the addition of 
30 complete protease inhibitors (Boehringer Mannheim), centrifuged at 4°C (14,000 x g, 10 min) 
and the supernatant retained for immunoprecipitation. JAK2 proteins were immunoprecipitated 
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using 5 /j1 anti-JAK2 antibody (UBI). Antigen-antibody complexes were recovered using 
protein A-Sepharose (30 //I of a 50% slurry). 

Western blotting: Immunoprecipitates were analysed by sodium dodecyl sulphate (SDS) - 
5 polyacrylamide gel electrophoresis (PAGE) under reducing conditions. Protein was then 
electrophoretically transferred to nitrocellulose, blocked overnight in 10% w/v skim-milk and 
washed in PBS/0. 1% v/v Tween-20 (Sigma) (wash buffer) prior to incubation with either anti- 
phosphotyrosine antibody (4G10) (1:5000, UBI), anti-FLAG antibody (1.6 //g/ml) or anti-JAK2 
antibody (1:2000. UBI) diluted in wash buffer/1% w/v BSA for 2 hr. Nitrocellulose blots were 
10 washed and primary antibody detected with either peroxidase-conjugated sheep anti-rabbit 
immunoglobulin (1:5000, Silenus) or peroxidase-conjugated sheep anti-mouse innmunoglobulin 
(1:5000, Silenus) diluted in wash buflfer/1% w/v BSA. Blots were washed and antibody binding 
visualised using the enhanced chemiluminescence (ECL) system (Amersham, UK) according 
to the manufacturers' instructions. 

15 

In-vitro kinase assay: An in vitro kinase assy was performed to assess intrinsic JAK2 kinase 
catalytic activity. JAK2 protein were immunopreciptated as described, washed twice in kinase 
assay buffer (50 mM NaCl, 5 mM MgClj, 5 mM MnC12, 1 mM NaF, 1 mM Na3V04, 10 mM 
HEPES, pH 7.4) and sUvSpended in an equal volume of kinase buffer containing 0.25 ^zCi/ml (y- 
20 ^^P)-ATP (30 min, room temperature). ExceS^ (y- P)-ATP was removed and the 
immunoprecipitates analysed by SDS/PAGE under reducing conditions. Gels were subjected 
to a mild alkaline hydrolysis by treatment with 1 M KOH (55°C, 2 hours) to remove 
phosphoserine and phosphothreonine. Radioactive bands were visualised with IMAGEQUANT 
software on a Phosphorlmage system (Molecular Dynamics, Sunnyvale, CA, USA). 

25 

EXAMPLE 38 
MAKING SOCS-1 KNOCKOUT CONSTRUCTS 

Diagrams of plasmid constructs and knockout constructs are shown in Figures 51-53. The 
30 genomic SOCS-1 clone 95-1 1-10 was digested with the restriction enzymes BamHl and EcoRl 
to obtain a 3.6Kb DNA fragment 3' of the coding region (SOCS-1 exon), which was used as 
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the 3' arm in the SOCS-1 knockout vectors. The ends of this fragment were then blunted. This 
fragment was then hgated into the following vectors: 

pBgalpAIoxNeo 
and pBgalpAloxNeoTK 

5 which had been linearized at the unique Xhol site and then blunted. This ligation resulted in the 
formation of the following vectors: 

3'SOCS-l arm in pBgalpAloxNeo 
and 3'SOCS-l arm in pBgalpAloxNeoTK 

10 The 5' arm of the SOCS-1 knockout vectors was constructed by using PCR to generate a 2.5Kb 
PGR product from the genomic SOCS-1 clone 95-1 1-10 just 5' of the SOCS-1 coding region 
(SOCS-1 exon). The oligo's used to generate this product were: 
5' oligo (sense) (2465) 

AGCT AGA TCT GGA CCC TAC AAT GGC AGC [SEQ ID NO:49] 

15 

3' oligo (antisense) (2466) 

AGCT AG ATC TGC CAT CCT ACT CGA GGG GCC AGC TGG [SEQ ID NO:50] 

The PCR product was then digested with the restriction enzyme Bglll, to generate Bglll ends 
20 to the PCR product. This 5* SOCS-1 PCR product,with Bglll, ends was then ligated as follows: 

3'SOCS-l arm in pBgalpAloxNeo and 3'SOCS-l arm in pBgalpAloxNeoTK, which had been 

linearized with the unique restriction enzyme BamHl. This resulted in the following vectors 

being formed: 

5'&3'SOCS-l arms in pBgalpAloxNeo 
25 and 5'&3'SOCS-l arms in pBgalpAloxNeoTK 

These were the final SOCS-1 knockout constructs. Both these constructs lacked the entire 

SOCS-1 coding region (SOCS-1 EXON), being replaced with portions of the Bgal, B globin 

polyA, PGK promoter, neomycin and PGK polyA sequences. The 5'&3'SOCS-l arms in 

30 pBgalpAloxNeoTK vector also contained the tymidine kinase gene sequence, between the 

neomycin and PGK poly A sequences. 
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The vectors: 5'&3'SOCS-l arms in pBgalpAloxNeo 

and 5'&3'SOCS-l arms in pBgalpAloxNeoTK 
were linearized with the unique restriction enzyme Notl and then transfected into Embryonic 
5 stem cells by electroporation. Clones which were resistant to neomycin were selected and 
analysed by southern blot to determine if they contained the correcdy integrated SOCS-1 
targeting sequence. In order to determine if correct integration had occurred, genomic DNA 
from the neomycin resistant clones was digested with the restriction enzyme EcoRl. The 
digested DNA was then blotted onto nylon filters and probed with a 1.5Kb EcoRl /Hind III 
10 DNA fragment, which was further 5' of the 5'arm sequence used in the knockout constructs. 
The band sizes expected for correct integration were: 

Wild type SOCS-1 allele 5.4Kb 

15 SOCS-1 knockout allele 8.2Kb in 5'&3'SOCS-l arms in pBgalpAloxNeo 
or 1 1Kb in 5'&3'SOCS-l arms in pBgalpAloxNeoTK transfomed cells. 

Those skilled in the art will appreciate that the invention described herein is susceptible to 
variations and modifications other than those specifically described. It is to be understood that 
20 the invention includes all such variations and modifications. The invention also includes all of 
the steps, features, compositions and compounds referred to or indicated in this specification, 
individually or collectively, and any and all combinations of any two or more of said steps or 
features. 
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Table 4J 

Summary of ESTs derived from mouse SOCS-4 cDNAs 

SOCS Species EST name End EST no 

SOCS-4 Mouse mc65f04 5* EST0549700 



Library source Con tig 

d 13.5- 14.5 mouse m4.1 
embryo 



mf42e06 5' EST0593477 dl3.5-l 4.5 mouse m4.1 

embryo 

mplOclO 5' EST0747905 d 8.5 mouse embryo m4.1 

mr81g09 5' EST0783081 dl3 embryo m4.1 

mtl9hl2 5' EST0816531 spleen m4.1 



Table 4.2 

Summary of ESTs derived from human SOCS-4 cDNAs 



SOCS Species 
SOCS-4 Human 



EST name 

27b5 

30d2 

J0159F 

J3802F 

EST19523 



End EST no 

5' EST0534081 



5' 



5' 



ESTS 1149 5' 

ESTl 80909 5' 

EST182619 5' 

ya99h09 3' 

ye70c04 5* 

yh53c09 5' 

3' 



ESTl Oil 01 5 
EST0951375 

EST0953220 



Library source 



retma 



EST0534315 retina 



EST046 1188 foetal heart 



EST0461428 foetal heart 



EST0958884 retina 



placenta 

Jurkat T- 
lymphocyte 

Jurkat T- 
lymphocyte 



EST0103262 placenta 



ESTOl 97390 placenta 
EST0197391 



Contig 

h4.2 
h4.2 
h4.2 
h4.2 
h4.2 
h4.2 
h4.2 

h4.1 

h4.2 



ESTOl 72673 foeatl liver/spleen h4.2 



h4.2 
h4.2 
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yh77gll 

yh87h05 

yi45h07 
yj04e06 

yql2h06 
yq56a06 
yq60e02 

yq92g03 

yq97h06 

yi^OfOl 
yt69c03 

yv30a08 
yv55f07 

yv57h09 

yv87h02 
yv98el 1 

yw68dlO 
yw82a03 



5' EST0203418 
3' EST0203419 



5' 
3* 

5' 

5* 
3' 

5' 

3' 

5' 
3' 

5* 
3' 

5' 
3' 

5' 

5' 
3* 

3* 

5' 
3' 

5* 
3' 



5* 
3' 

5* 

5' 



EST0204888 
EST0204773 

EST0246604 

EST0258541 
EST0258285 

EST0309968 

EST0346924 

EST0347259 
EST0347209 

EST0355932 
EST0355884 



EST0338303 



EST0463331 



EST0400679 
EST04O0680 

EST0441370 

EST0463005 



placenta 

placenta 

placenta 
placenta 

foetal liver spleen 
foetal liver spleen 
foetal liver spleen 



melanocyte 
melanocyte 



h4.2 
h4.1 

h4.1 
h4.1 

h4.2 

h4.1 
h4.1 

h4.2 

h4.2 

h4.2 
h4.2 



foetal liver spleen h4.2 
h4.2 



EST035761 8 foetal liver spleen h4.2 

EST0357416 h4.2 

EST0372402 foetal liver spleen h4.2 

EST0338395 foetal liver spleen h4.2 



h4.2 



EST0458506 foetal liver spleen h4.2 
EST0465391 foetal liver spleen h4.2 



h4.2 



EST0464336 foetal liver spleen h4.2 
EST0458765 h4.2 

EST0388085 



h4.2 

h4.2 
h4.2 



placenta (8-9 wk) h4.2 
placenta (8-9 wk) h4.2 
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yx08a07 
yx72h06 

yx76b09 
yy37h08 
yy66b02 

za81f08 
zbl8f07 
zc06e08 

zdl4g06 
zd51hl2 
zd52b09 

ze25gll 
ze69f02 

zf54fD3 
zh96e07 

zv66hl2 
zs83a08 

zs83g08 



y 

3' 

5' 
3' 

5' 

5' 

5' 

5' 

3" 

5' 
3' 

3* 

3' 

5' 
3' 

3' 

5' 
3' 

5' 

5* 
3' 

5' 
5* 
3' 
5' 
3' 



EST0433678 



EST040701 6 melanoocyte 



EST0435158 
EST0422871 

EST0434011 

EST0451704 

EST0505446 

EST0511777 

EST0485315 

EST0540473 
EST0540354 

EST0564666 

EST0578099 

EST0582012 
EST0581958 

EST0679543 

EST0635563 
EST0635472 

EST068O111 

EST0616241 
EST06 15745 

ESTl 043265 

EST0920072 

EST0920016 

EST0920121 

EST0920122 



melanoocyte 
melanoocyte 

melanoocyte 

melanoocyte 

multiple sclerosis 
lesion 

foetal lung 

foetal lung 



foetal heart 



foetal heart 



foetal heart 



foetal heart 



retma 



retma 



8-9w foetus 



h4,l 

h4.1 

h4.2 
h4.1 

h4.2 

h4.2 

h4.2 

h4.2 
h4.1 



parathyroid tumor h4. 1 
h4.1 



h4.1 

h4.] 

h4.1 
h4.1 

h4.1 

h4.2 
h4.1 

h4.2 



foetal liver spleen h4.2 
h4.2 



h4.2 



germinal centre B h4. 1 
cell 

h4.1 



germinal centre B h4. 1 
cell 



h4.1 
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Table 5.1 

Summary of ESTs derived from mouse SOCS-5 cDNAs 

SOCS Species EST name End EST no Library source Contig 

SOCS-5 Mouse mc55a01 5* EST0541556 d 13.5- 14.5 mouse m5.1 

embryo 

mh98f09 5' EST0638237 placenta m5.1 

my26hl2 5' EST0859939 mixed organs m5.1 

ve24e06 5' EST0819106 heart m5.1 

Table 5.2 

Summary of ESTs derived from human SOCS-5 cDNAs 

SOCS Species EST name End EST no Library source Contig 

SOCS-5 Human EST15B103 ? EST0258029 adipose tissue h5.1 

EST15B105 ? EST0258028 adipose tissue h5.1 

EST27530 5* EST0965892 cerebellum h5.1 

zfSOfOl 5' EST0679820 retina h5.1 

Table 6.1 

Summary of ESTs derived from mouse SOCS-6 cDNAs 

SOCS Species EST name End EST no Library source Contig 

SOCS-6 Mouse mco4c05 5* EST0525832 dl9.5 embryo m6.1 

md48a03 5' EST0566730 d 13.5- 14.5 embryo m6.1 

mf31d03 5* EST0675970 dl 3.5-14.5 embryo m6.1 

mh26b07 5' EST0628752 d 13.5- 14.5 placenta m6.1 

nih78ell 5' EST0637608 dl3.5- 14.5 placenta m6.1 

mh88h09 5' EST0644383 d 13.5-14.5 placenta m6.1 
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mh94h07 

mi27h04 

mj29c05 

mp66g04 

mw75g03 

va53b05 

vb34h02 

vc55d07 

vc59e05 

vc67d03 

vc68dlO 

vc97h01 

vc99c08 

vd07h03 

vd08c01 

vd09bl2 

vdl9b02 

vd29a04 

vd46d06 



5' 
5' 
5' 
5' 
5' 
5' 
5' 
3* 
3' 
3* 
3' 
3' 
3' 
3' 
3' 
3' 
3* 
3* 



EST0638078 dl3.5-14.5 placenta 

EST0644252 dl3.5-14.5 embryo 

EST0664093 dl3.5-14.5 embryo 

EST0757905 thymus 

EST0847938 liver 



EST0901540 
EST0930132 
EST1057735 
EST1058201 
ESTl 057849 
ESTl 058663 
ESTl 059343 
ESTl 0594 10 
EST1058173 
ESTl 058275 
ESTl 058632 
EST1059723 
? none found 



dl2.5 embryo 
lymph node 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 cell embryo 
2 ceil embryo 
2 cell embryo 



3' ? none found 



m6.1 
m6.1 
m6.i 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6.1 
m6J 
m6.1 
m6.1 
m6.1 
m6.1 
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Table 6.2 

Summary of ESTs derived from human SOCS-5 cDNAs 

SOCS Species EST name End EST no 

SOCS-6 Human 

yf61e08 5' EST0184387 

yf93a09 5' EST0186084 

yg05fl2 5' EST0191486 

yg41f04 5' EST0195017 

yg45c02 5' EST0185308 

yhllflO 5' EST0236705 

yhl3b05 5' 



zc35al2 

ze02h08 

zl09a03 

zl69el0 
zn39d08 



zo39e06 5' 



EST0237191 
EST0236958 

EST0555518 

EST0603826 
EST0603718 

EST0773936 
EST0773892 

EST0683363 

EST0718885 

EST0785947 



Library source 

d73 infant brain 
d73 infant brain 
d73 infant brain 
d73 infant brain 
d73 infant brain 
d73 infant brain 
d73 infant brain 



senescent 
fibroblasts 

foetal heart 



pregnant uterus 
colon 

endothelial cell 
endothelial cell 



Contig 

h6.1 

h6.1 

h6.1 

h6.1 

h6.1 

h6.1 

h6.1 
h6.2 

h6.1 

h6.1 
h6.2 

h6.1 
h6.1 

h6.1 

h6.1 

h6.1 



Table 7.1 

Summary of ESTs derived from mouse SOCS-7 cDNAs 
SOCS Species EST name End EST no 



Library source 



SOCS-7 Mouse mj39a01 5' EST0665627 dl 3.5/14.5 embryo 

vi52h07 5' EST1267404 d7. 5 embryo 



Contig 

m7.1 
m7.1 
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Table 12 

Summary of ESTs derived from human SOCS-5 cDNAs 
SOCS Species EST name End EST no 

SOCS-7 HUMAN STSWI-30171 (G21563) 



Library source Contig 

Ciiromosome 2 h7.2 



EST00939 5' EST0000906 hippocampus h7.1 

EST12913 3' EST0944382 uterus h7.2 



yc29b05 
yp49fl0 
ztlOf03 

2x73g04 



ESTO 128727 liver 



3' EST0301914 retina 



3' EST0921231 



Table 8.1 

Summary of ESTs derived from mouse SOCS-8 cDNAs 



h7.2 



h7.2 



EST0922932 germinal centre h7.2 
Bcell 



h7.1 



ESTl 1 02975 ovarian tumour h7. 1 



SOCS Species EST name 
SOCS-8 Mouse mjl6e09 
vj27a029 



End EST no Library source Contig 

rl EST0666240 d 13.5/1 4.5 embryo mS.l 

rl ESTl 155973 heart mS.l 



Table 9.1 

Summary of ESTs derived from mouse SOCS-9 cDNAs 
SOCS Species EST name End EST no 



Mouse me65d05 5' 



EST0585211 



Library source Contig 
d 13.5/14.5 embryo m9.1 



Table 9.2 

Summary of ESTs derived from human SOCS-5 cDNAs 

SOCS Species EST name End EST no Library source Contig 

SOCS-9 Human CSRL-83f2-u (B06659) chromsomell h9.1 



ESTl 14054 5' EST0939759 placenta 
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yy06b07 
yy06g06 
2r40c09 

zr72h01 

yx92c08 
yx93b08 
hfe0662 



3' EST0434504 melanocyte h9. 1 

5' EST0443783 melanocyte h9.1 

5' EST0832461 melanocyte, heart, h9.1 
uterus 

5' EST0892025 melanocyte^ heart. h9.1 
uterus 

3* EST0892026 h9.1 

5' EST0441160 melanocyte h9.1 

5' EST0441260 melanocyte h9.1 

5' EST0889611 foetal heart h9.1 



Table 10.1 

Summary of ESTs derived from mouse SOCS-10 cDNAs 
SOCS Species EST name End EST no 



Mouse mbl4dl2 
mb40f06 
mg89bl 1 
mq89el2 
mp03gl2 
vh53cll 



5' 



5* 



5' 



5' 



Library source Contig 

EST0549887 dl9.5 embryo mlO.l 

EST0515064 d 19.5 embryo mIO.l 

EST0630631 dl 3.5- 14.5 embryo mlO.l 

EST0776015 heart ml 0.1 



EST0741991 heart 



mlO.l 



5' ESTl 154634 mammary gland mlO.l 



Table 10.2 

Summary of ESTs derived from human SOCS-5 cDNAs 
SOCS Species EST name End EST no 
SOCS-10 Human aa48hl0 3' ESTl 135220 



Library source 



germinal centre B cell 



zp35h01 3* EST0819137 muscle 
zp97h 12 5* EST0835442 muscle 



Contig 

hi 0.2 
hlO.2 
hlO.2 
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3' EST0831211 



zqOShOl 



EST0835907 muscle 



zr34g05 5' EST0834251 melanocyte. heart» 

uterus 

3* EST0834440 
EST73000 5 EST! 004491 ovary 
HSDHEI005 ? EST0013906 heart 



Table 11.1 

Summary of ESTs derived from human SOCS-5 cDN As 



SOCS Species EST name End 
SOCS-11 Human zt24h06 rl 



EST no Library source 

EST0925023 ovarian tumor 



hlO.2 
hlO.l 
hi 0.2 
hlO.2 
hi 0.2 
hlO.2 



zr43b02 



rl 
si 



EST0873006 
EST0872954 



Table 12.1 

Summary of ESTs derived from mouse SOCS-12 cDNAs 
SOCS Species EST name End EST no 



Contig 
11.1 



melanocyte, heart, uterus 11.1 
11.1 



Library source Contig 



SOCS- 1 2 Mouse EST03803 



mtl8f02 5' 
mz60gl0 5' 
vaOScll 5* 



EST1054173 

EST0817652. 
EST0890872 
EST0909449 



day 7 . 5 emb ml2 . 1 

ectoplacental 

cone 

3NbMS spleen ml2 . 1 

lymph node ml2 . 1 

lymph node ml2 . 1 



Table 12.2 

Summary of ESTs derived from human SOCS-5 cDNAs 



SOCS Species EST name 



End EST no Library source Contig 



SOCS-12 Human STS-SHGC- 13867 



Chromosome 2 hi 2.2 



ESTl 77695 



5 ' EST094807 1 Jurkat cells 



hl2.1 
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EST64550 
EST76868 
PMY2369 
yb38f04 

yg74el2 
yhl3g04 

yh48b06 
yh53a05 

yn48h09 

yn90a09 
yoO8f03 

yolleOl 
yo63bl2 

yq56g02 
zh57c04 
zh79h01 
zh99all 
zo92hl2 

zs48c01 



5' EST0997367 Jurkat cells 

5' ESTl 007291 pineal body 

5' ESTl 115998 KG-1 

5' ESTOl 08807 foetal spleen 
3* 

5* EST0224407 d73 brain 

5* EST0237226 d73 brain 

3" EST0236992 



5' yh48b06 



5' 
3* 

5' 
3' 

3' 

5' 
3' 

3* 

5' 
3' 

3* 

3' 

3* 

3' 

5' 
3' 



placenta 



ESTOl 97282 placenta 
ESTOl 97486 

EST0278258 brain 
EST0278259 

EST0302557 brain 

EST0301790 brain 
EST0302059 

? none found 

EST0303606 breast 
EST0304085 



hl2.1 

hl2.2 

hl2.1 

hl2.1 
hl2.2 

hill 

hl2.1 
hl2.2 

hl2.2 

hl2.2 
hl2.2 

hl2.2 
hl2.2 

hi 2.2 

hl2.2 
hi 2-2 

hi 2.2 

hl2.2 
hi 2.2 



EST0346935 foetal liver spleen hl2.1 

EST0594201 foetal liver spleen hl2.2 

EST0598945 foetal liver spleen hl2.2 

EST0618570 foetal liver spleen hl2.2 

EST0803392 ovarian cancer hl2.1 



EST0803393 



hl2.2 



5' EST0925714 germinal centre hl2.1 
Been 



3' EST0925530 



hl2.2 
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zs45h02 



EST0932296 germinal centre h 1 2.2 
Bcell 



Table 13.1 

Summary of ESTs derived from mouse SOCS-13 cDNAs 



SOCS 



Species EST name End EST no 



Library source 



Contig 



SOCS-13 Mouse ma39c09 5' 

me60c05 5' 

mi78g05 5' 

mklOcll 5' 

mo48gl2 5' 

mp94a01 5' 

vb57c07 5* 

vh07cll 5* 



EST0517875 day 1 9.5 embryo ml3.1 

EST0584950 day 13.5/14.5 embryo ml3.1 

EST0653 834 day 1 9.5 embryo m 1 3 . 1 

EST0735158 day 19.5 embryo ml3.1 

EST07451 1 1 day 10.5 embryo ml3.1 

EST0762827 thymus ml 3.1 

EST1028976 day 1 1 .5 embryo ml3.1 

ESTl 1 17269 mammary gland ml3.1 



Table 13,2 

Summary of ESTs derived from human SOCS-13 cDNAs 



SOCS Species EST name End EST no 



Library source Contig 



SOCS-13 Human EST59161 5' EST0992726 infant brain 



Table 141 

Summary of ESTs derived from mouse S0CS.14 cDNAs 



SOCS Species EST name End EST no 



hl3.1 



Library source Contig 



SOCS-14 mouse mi75e03 5' EST0651892 dl9.5 embryo ml4.1 

vd29hl l 5' EST1067080 2 cell embryo ml4.1 
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vd53g07 5' ESTl 119627 2 cell embryo ml4.1 



Table 15.1 

Summary of ESTs derived from mouse SOCS-15 cDNAs 



SOCS 



Species EST name End EST no 



Library source Contig 



SOCS-15 Mouse mh29b05 5' EST0628834 placenta 
mh98h09 5* EST0638243 placenta 



mI45a02 

mu43al0 

my38c09 

vj37h07 

AC002393 



5* 
5' 
5' 



EST0687171 testis 



EST851588 
EST87846] 
ESTll 74791 



ml5.1 
ml5.1 
ml5.1 



thymus ml 5.1 

pooled organs ml 5.1 

diaphragm ml5.1 

Chromosome 6 m 1 5 . 1 
BAC 



Table 15,2 

Summary of ESTs derived from human SOCS-15 cDNAs 



SOCS Species EST name End EST no Library source Contig 

SOCS-15 Human EST98889 5' EST1026568 thyroid hl5.1 



ne48bo5 



ybl2hl2 



HSU47924 



5' 
3' 



ESTl 138057 colon tumour hl5.1 



EST0098885 placenta 
EST0098886 



hl5.1 
hl5.1 



Chromosome 12 hl5.1 
BAC 
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(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 



CACGCCGCCC ACGTGAAGGC 20 



{ 2 ) INFORMATION FOR SEQ ID NO : 2 ; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 
TTCGCCAATG ACAAGACGCT 20 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1236 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .636 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



CGAGGCTCAA GCTCCGGGCG GATTCTGCGT GCCGCTCTCG CTCCTTGGGG TCTGTTGGCC -101 
GGCCTGTGCC ACCCGGACGC CCGGCTCACT GCCTCTGTCT CCCCCATCAG CGCAGCCCCG -41 
GACGCTATGG CCCACCCCTC CAGCTGGCCC CTCGAGTAGG -1 



ATG GTA GCA' CGC AAC CAG GTG GCA GCC GAC AAT GCG ATC TCC CCG GCA 48 
Met Val Ala Arg Asn Gin Val Ala Ala Asp Asn Ala lie Ser Pro Ala 
15 10 15 

GCA GAG CCC CGA CGG CGG TCA GAG CCC TCC TCG TCC TCG TCT TCG TCC 96 
Ala Glu Pro Arg Arg Arg Ser Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

TCG CCA GCG GCC CCC GTG. CGT CCC CGG CCC TGC CCG GCG GTC CCA GCC 144 
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Ser Pro Ala Ala Pro Val Arg Pro Arg Pro Cys Pro Ala Val Pro Ala 
35 40 45 

CCA GCC OCT GGC GAG ACT CAC TTC CGC ACC TTC CGC TCC CAC TCC GAT 192 
Pro Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ser Asp 
50 55 60 

TAG CGG CGC ATC ACG CGG ACC AGC GCG CTC CTG GAG GCC TGC GGC TTC 240 
Tyr Arg Arg lie Thr Arg Thr Ser Ala Leu Leu Asp Ala Cys Gly Phe 
65 70 75 80 

TAT TGG GGA CCC CTG AGC GTG CAC GGG GCG CAC GAG CGG CTG CGT GCC 2 88 

Tyr Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ala 
85 90 95 

GAG CCC GTG GGC ACC TTC TTG GTG CGC GAG AGT CGT CAA CGG AAC TGC 336 
Glu Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys 
100 105 110 

TTC TTC GCG CTC AGC GTG AAG ATG GCT TCG GGC CCC ACG AGC ATC CGC 384 
Phe Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser lie Arg 
115 120 125 

GTG CAC TTC GAG GCC GGC CGC TTC CAC TTG GAG GGC AGC CGC GAG ACC 432 
Val His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Ser Arg Glu Thr 
130 135 140 

TTC GAG TGC CTT TTC GAG CTG CTG GAG CAC TAG GTG GCG GCG CCG CGC 480 
Phe Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arg 
145 150 155 160 

CGC ATG TTG GGG GCC CCG CTG CGC CAG CGC CGC GTG CGG CCG CTG GAG 528 
Arg Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin 
165 170 175 

GAG CTG TGT CGC CAG CGC ATC GTG GCC GCC GTG GGT CGC GAG AAC CTG 576 
Glu Leu Cys Arg Gin Arg lie Val Ala Ala Val Gly Arg Glu Asn Leu 
180 185 190 

GCG CGC ATC CGT CTT AAC CCG GTA CTC CGT GAG TAG CTG AGT TCC TTC 624 
Ala Arg lie Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe 
195 200 205 

CCC TTC CAG ATC TGA CCGGCTG CCGCTGTGCC GCAGCATTAA GTGGGGGCGC 676 
Pro Phe Gin lie * 



210 














CTTATTATTT 


CTTATTATTA 


ATTATTATTA 


TTTTTCTGGA 


ACCACGTGGG 


AGCCCTCCCC 


736 


GCCTGGGTCG 


GAGGGAGTGG 


TTGTGGAGGG 


TGAGATGCCT 


CCCACTTCTG 


GCTGGAGACC 


796 


TCATCCCACC 


TCTCAGGGGT 


GGGGGTGCTC 


CCCTCCTGGT 


GCTCCCTCCG 


GGTCCCCCCT 


856 


GGTTGTAGCA 


GCTTGTGTCT 


GGGGCCAGGA 


CCTGAATTCC 


ACTCCTACCT 


CTCCATGTTT 


916 


ACATATTCCC 


AGTATCTTTG 


CACAAACCAG 


GGGTCGGGGA 


GGGTCTCTGG 


CTTCATTTTT 


976 


CTGCTGTGCA 


GAATATCCTA 


TTTTATATTT 


TTACAGCCAG 


TTTAGGTAAT 


AAACTTTATT 


1036 


ATGAAAGTTT 


TTTTTTAAAA 


GAAAAAAAAA 


AAAAAAAAA 






1075 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 212 amino acids 
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(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



Met Val Ala Arg Asn Gin Val Ala Ala Asp Asn Ala lie Ser Pro Ala 
15 10 15 

Ala Glu Pro Arg Arg Arg Ser Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

Ser Pro Ala Ala Pro Val Arg Pro Arg Pro Cys Pro Ala Val Pro Ala 
35 40 45 

Pro Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ser Asp 
50 55 60 

Tyr Arg Arg lie Thr Arg Thr Ser Ala Leu Leu Asp Ala Cys Gly Phe 
65 70 75 80 

Tyr Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ala 
85 90 95 

Glu Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys 
100 105 110 

Phe Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser lie Arg 
115 120 125 

Val His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Ser Arg Glu Thr 
130 135 140 

Phe Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arg 
145 150 155 160 

Arg Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin 
165 170 175 

Glu Leu Cys Arg Gin Arg lie Val Ala Ala Val Gly Arg Glu Asn Leu 
180 185 190 

Ala Arg lie Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe 
195 200 . 205 

Pro Phe Gin lie 
210 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1121 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 223.. 819 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GCGATCTGTG GGTGACAGTG TCTGCGAGAG ACTTTGCCAC ACCATTCTGC CGGAATTTGG 60 

AGAAAAAGAA CCAGCCGCTT CCAGTCCCCT CCCCCTCCGC CACCATTTCG GACACCCTGC 120 

ACACTCTCGT TTTGGGGTAC CCTGTGACTT CCAGGCAGCA CGCGAGGTCC ACTGGCCCCA 180 

GCTCGGGCGA CCAGCTGTCT GGGACGTGTT GACTCATCTC CC ATG ACC CTG CGG 234 

Met Thr Leu Arg 
1 

TGC CTG GAG CCC TCC GGG AAT GGA GCG GAC AGG ACG CGG AGC CAG TGG 2 82 

Cys Leu Glu Pro Ser Gly Asn Gly Ala Asp Arg Thr Arg Ser Gin Trp 
5 10 15 20 

GGG ACC GCG GGG TTG CCG GAG GAA CAG TCC CCC GAG GCG GCG CGT CTG 330 
Gly Thr Ala Gly Leu Pro Glu Glu Gin Ser Pro Glu Ala Ala Arg Leu 
25 30 35 

GCG AAA GCC CTG CGC GAG CTC AGT CAA ACA GGA TGG TAC TGG GGA AGT 378 
Ala Lys Ala Leu Arg Glu Leu Ser Gin Thr Gly Trp Tyr Trp Gly Ser 
40 45 50 

ATG ACT GTT AAT GAA GCC AAA GAG AAA TTA AAA GAG GCT CCA GAA GGA 426 
Met Thr Val Asn Glu Ala Lys Glu Lys Leu Lys Glu Ala Pro Glu Gly 
55 60 65 

ACT TTC TTG ATT AGA GAT AGT TCG CAT TCA GAC TAC CTA CTA ACT ATA 474 
Thr Phe Leu lie Arg Asp Ser Ser His Ser Asp Tyr Leu Leu Thr lie 
70 75 80 

TCC GTT AAG ACG TCA GCT GGA CCG ACT AAC CTG CGG ATT GAG TAC CAA 522 
Ser Val Lys Thr Ser Ala Gly Pro Thr Asn Leu Arg lie Glu Tyr Gin 
85 90 95 100 

GAT GGG AAA TTC AGA TTG GAT TCT ATC ATA TGT GTC AAG TCC AAG CTT 570 
Asp Gly Lys Phe Arg Leu Asp Ser lie lie Cys Val Lys Ser Lys Leu 
105 110 115 

AAA CAG TTT GAC AGT GTG GTT CAT CTG ATT GAC TAC TAT GTC CAG ATG 618 
Lys Gin Phe Asp Ser Val Val His Leu lie Asp Tyr Tyr Val Gin Met 
120 125 130 

TGC AAG GAT AAA CGG ACA GGC CCA GAA GCC CCA CGG AAT GGG ACT GTT 666 
Cys Lys Asp Lys Arg Thr Gly Pro Glu Ala Pro Arg Asn Gly Thr Val 
135 140 145 

CAC CTG TAC CTG ACC AAA CCT CTG TAT ACA TCA GCA CCC ACT CTG CAG 714 
His Leu Tyr Leu Thr Lys Pro Leu Tyr Thr Ser Ala Pro Thr Leu Gin 
150 155 160 

CAT TTC TGT CGA CTC GCC ATT AAC AAA TGT ACC GGT ACG ATC TGG GGA 762 
His Phe Cys Arg Leu Ala lie Asn Lys Cys Thr Gly Thr lie Trp Gly 
165 170 175 180 

CTG CCT TTA CCA ACA AGA CTA AAA GAT TAC TTG GAA GAA TAT AAA TTC 810 
Leu Pro Leu Pro Thr Arg Leu Lys Asp Tyr Leu Glu Glu Tyr Lys Phe 
185 190 195 

CAG GTA TAAGTATTTC TCTCTCTTTT TCGTTTTTTT TTAAAAAAAA AAAAACACAT 866 
Gin Val 

GCCTCATATA GACTATCTCC GAATGCAGCT ATGTGAAAGA GAACCCAGAG GCCCTCCTCT 926 

GGATAACTGC GCAGAATTCT CTCTTAAGGA CAGTTGGGCT CAGTCTAACT TAAAGGTGTG 986 



SUBSTmrrp ^mv^ rr^. , ^ . 



wo 98/20023 



PCT/AU97/00729 



- 112 



AAGATGTAGC TAGGTATTTT AAAGTTCCCC TTAGGTAGTT TTAGCTGAAT GATGCTTTCT 1046 
TTCCTATGGC TGCTCAAGAT CAAATGGCCC TTTTAAATGA AACAAAACAA AACAAAACAA 1106 
AAAAAAAAAA AAAAA 1121 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Thr Leu Arg Cys Leu Glu Pro Ser Gly Asn Gly Ala Asp Arg Thr 
15 10 15 

Arg Ser Gin Trp Gly Thr Ala Gly Leu Pro Glu Glu Gin Ser Pro Glu 
20 25 30 

Ala Ala Arg Leu Ala Lys Ala Leu Arg Glu Leu Ser Gin Thr Gly Trp 
35 40 45 

Tyr Trp Gly Ser Met Thr Val Asn Glu Ala Lys Glu Lys Leu Lys Glu 
50 55 60 

Ala Pro Glu Gly Thr Phe Leu lie Arg Asp Ser Ser His Ser Asp Tyr 
65 70 75 80 

Leu Leu Thr lie Ser Val Lys Thr Ser Ala Gly Pro Thr Asn Leu Arg 
85 90 95 

He Glu Tyr Gin Asp Gly Lys Phe Arg Leu Asp Ser He He Cys Val 
100 105 110 

Lys Ser Lys Leu Lys Gin Phe Asp Ser Val Val His Leu He Asp Tyr 
115 120 125 

Tyr Val Gin Met Cys Lys Asp Lys Arg Thr Gly Pro Glu Ala Pro Arg 
130 135 140 

Asn Gly Thr Val His Leu Tyr Leu Thr Lys Pro Leu Tyr Thr Ser Ala 
145 150 155 160 

Pro Thr Leu Gin His Phe Cys Arg Leu Ala He Asn Lys Cys Thr Gly 
165 170 175 

Thr He Trp Gly Leu Pro Leu Pro Thr Arg Leu Lys Asp Tyr Leu Glu 
180 185 190 

Glu Tyr Lys Phe Gin Val 
195 

(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2187 base pairs 

(B) TYPE: nucleic acid 

(C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

SUBSTITUTE SHEET (Rule 26) 



wo 98/20023 



PCT/AU97/00729 



-113- 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 18.. 695 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CGCTGGCTCC GTGCGCC ATG GTC ACC CAC AGC AAG TTT CCC GCC GCC GGG 50 
Met Val Thr His Ser Lys Phe Pro Ala Ala Gly 
15 10 

ATG AGC CGC CCC CTG GAC ACC AGC CTG CGC CTC AAG ACC TTC AGC TCC 98 
Met Ser Arg Pro Leu Asp Thr Ser Leu Arg Leu Lys Thr Phe Ser Ser 
15 20 25 

AAA AGC GAG TAC CAG CTG GTG GTG AAC GCC GTG CGC AAG CTG CAG GAG 146 
Lys Ser Glu Tyr Gin Leu Val Val Asn Ala Val Arg Lys Leu Gin Glu 
30 35 40 

AGC GGA TTC TAC TGG AGC GCC GTG ACC GGC GGC GAG GCG AAC CTG CTG 194 
Ser Gly Phe Tyr Trp Ser Ala Val Thr Gly Gly Glu Ala Asn Leu Leu 
45 50 55 

CTC AGC GCC GAG CCC GCG GGC ACC TTT CTT ATC CGC GAC AGC TCG GAC 242 
Leu Ser Ala Glu Pro Ala Gly Thr Phe Leu lie Arg Asp Ser Ser Asp 
60 65 70 75 

CAG CGC CAC TTC TTC ACG TTG AGC GTC AAG ACC CAG TCG GGG ACC AAG 290 
Gin Arg His Phe Phe Thr Leu Ser Val Lys Thr Gin Ser Gly Thr Lys 
80 85 90 

AAC CTA CGC ATC CAG TGT GAG GGG GGC AGC TTT TCG CTG CAG AGT GAC 338 
Asn Leu Arg lie Gin Cys Glu Gly Gly Ser Phe Ser Leu Gin Ser Asp 
95 100 105 

CCC CGA AGC ACG CAG CCA GTT CCC CGC TTC GAC TGT GTA CTC AAG CTG 386 
Pro Arg Ser Thr Gin Pro Val Pro Arg Phe Asp Cys Val Leu Lys Leu 
110 115 120 

GTG CAC CAC TAC ATG CCG CCT CCA GGG ACC CCC TCC TTT TCT TTG CCA 434 
Val His His Tyr Met Pro Pro Pro Gly Thr Pro Ser Phe Ser Leu Pro 
125 130 135 

CCC ACG GAA CCC TCG TCC GAA GTT CCG GAG CAG CCA CCT GCC CAG GCA 482 
Pro Thr Glu Pro Ser Ser Glu Val Pro Glu Gin Pro Pro Ala Gin Ala 
140 145 150 155 

CTC CCC GGG AGT ACC CCC AAG AGA GCT TAC TAC ATC TAT TCT GGG GGC 53 0 

Leu Pro Gly Ser Thr Pro Lys Arg Ala Tyr Tyr lie Tyr Ser Gly Gly 
160 165 170 

GAG AAG ATT CCG CTG GTA CTG AGC CGA CCT CTC TCC TCC AAC GTG GCC 578 
Glu Lys lie Pro Leu Val Leu Ser Arg Pro Leu Ser Ser Asn Val Ala 
175 180 185 

ACC CTC CAG CAT CTT TGT CGG AAG ACT GTC AAC GGC CAC CTG GAC TCC 626 
Thr Leu Gin His Leu Cys Arg Lys Thr Val Asn Gly His Leu Asp Ser 
190 195 200 

TAT GAG AAA GTG ACC CAG CTG CCT GGA CCC ATT CGG GAG TTC CTG GAT 674 
Tyr Glu Lys Val Thr Gin Leu Pro Gly Pro lie Arg Glu Phe Leu Asp 
205 210 215 

CAG TAT GAT GCT CCA CTT TAAGGAGCAA AAGGGTCAGA GGGGGGCCTG 722 
Gin Tyr Asp Ala Pro Leu 
220 225 
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GGTCGGTCGG 


TCGCCTCTCC 


TCCGAGGCAC 


ATGGCACAAG 


CACAAAAATC 


CAGCCCCAAC 


782 


GGTC GG TAG C 


TCCCAGTGAG 


C C AGGGGC AG 


ATTGGCTTCT 


TCCTCAGGCC 


CTCCACTCCC 


842 


GCAGAGTAGA 


GCTGGCAGGA 


CCTGGAATTC 


GTCTGAGGGG 


AGGGGGAGCT 


GCCACCTGCT 


902 


TTCCCCCCTC 


CCCCAGCTCC 


AGCTTCTTTC 


AAGTGGAGCC 


AGCCGGCCTG 


GCCTGGTGGG 


962 


ACAATACCTT 


TGACAAGCGG 


ACTCTCCCCT 


CCCCTTCCTC 


CACACCCCCT 


CTGCTTCCCA 


1022 


AGGGAGGTGG 


GGACACCTCC 


AAGTGTTGAA 


CTTAGAACTG 


CAAGGGGAAT 


CTTCAAACTT 


1082 


TCCCGCTGGA 


ACTTGTTTGC 


GCTTTGATTT 


GGTTTGATCA 


AGAGCAGGCA 


CCTGGGGGAA 


1142 


GGATGGAAGA 


GAAAAGGGTG 


TGTGAAGGGT 


TTTTATGCTG 


GCCAAAGAAA 


TAACCACTCC 


1202 


CACTGCCCAA 


CCTAGGTGAG 


GAGTGGTGGC 


TCCTGGCTCT 


GGGGAGAGTG 


GCAAGGGGTG 


1262 


ACCTGAAGAG 


AGCTATACTG 


GTGCCAGGCT 


CCTCTCCATG 


GGGCAGCTAA 


TGAAACCTCG 


1322 


CAGATCCCTT 


GCACCCCAGA 


ACCCTCCCCG 


TTGTGAAGAG 


GCAGTAGCAT 


TTAGAAGGGA 


1382 


GACAGATGAG 


GCTGGTGAGC 


TGGCCGCCTT 


TTCCAACACC 


GAAGGGAGGC 


AGATCAACAG 


1442 


ATGAGCCATC 


TTGGAGCCCA 


GGTTTCCCCT 


GGAGCAGATG 


GAGGGTTCTG 


CTTTGTCTCT 


1502 


CCTATGTGGG 


GCTAGGAGAC 


TCGCCTTAAA 


TGCCCTCTGT 


CCCAGGGATG 


GGGATTGGCA 


1562 


CACAAGGAGC 


CAAACACAGC 


CAATAGGCAG 


AGAGTTGAGG 


GATTCACCCA 


GGTGGCTACA 


1622 


GGCCAGGGGA 


AGTGGCTGCA 


GGGGAGAGAC 


CCAGTCACTC 


CAGGAGACTC 


CTGAGTTAAC 


1682 


ACTGGGAAGA 


CATTGGCCAG 


TCCTAGTCAT 


CTCTCGGTCA 


GTAGGTCCGA 


GAGCTTCCAG 


1742 


GCCCTGCACA 


GCCCTCCTTT 


CTCACCTGGG 


GGGAGGCAGG 


AGGTGATGGA 


GAAGCCTTCC 


1802 


CATGCCGCTC 


ACAGGGGCCT 


CACGGGAATG 


CAGCAGCCAT 


GCAATTACCT 


GGAACTGGTC 


1862 


CTGTGTTGGG 


GAGAAACAAG 


TTTTCTGAAG 


TCAGGTATGG 


GGCTGGGTGG 


GGCAGCTGTG 


1922 


TGTTGGGGTG 


GCTTTTTTCT 


CTCTGTTTTG 


AATAATGTTT 


ACAATTTGCC 


TCAATCACTT 


1982 


TTATAAAAAT 


CCACCTCCAG 


CCCGCCCCTC 


TCCCCACTCA 


GGCCTTCGAG 


GCTGTCTG7VA 


2042 


GATGCTTGAA 


AAACTCAACC 


AAATCCCAGT 


TCAACTCAGA 


CTTTGCACAT 


ATATTTATAT 


2102 


TTATACTCAG 


AAAAGAAACA 


TTTCAGTAAT 


TTATAATAAA 


AGAGCACTAT 


TTTTTAATGA 


2162 




AAAAAAAAAA 


AAAAA 








2187 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 5 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Met Val Thr His Ser Lys Phe Pro Ala Ala Gly Met Ser Arg Pro Leu 
1 .5 10 15 

Asp Thr Ser Leu Arg Leu Lys Thr Phe Ser Ser Lys Ser Glu Tyr Gin 
20 25 30 
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Leu Val Val Asn Ala Val Arg Lys Leu Gin Glu Ser Gly Phe Tyr Trp 
35 40 45 

Ser Ala Val Thr Gly Gly Glu Ala Asn Leu Leu Leu Ser Ala Glu Pro 
50 55 60 

Ala Gly Thr Phe Leu lie Arg Asp Ser Ser Asp Gin Arg His Phe Phe 
65 70 75 80 

Thr Leu Ser Val Lys Thr Gin Ser Gly Thr Lys Asn Leu Arg lie Gin 
85 90 95 

Cys Glu Gly Gly Ser Phe Ser Leu Gin Ser Asp Pro Arg Ser Thr Gin 
100 105 110 

Pro Val Pro Arg Phe Asp Cys Val Leu Lys Leu Val His His Tyr Met 
115 120 125 

Pro Pro Pro Gly Thr Pro Ser Phe Ser Leu Pro Pro Thr Glu Pro Ser 
130 135 140 

Ser Glu Val Pro Glu Gin Pro Pro Ala Gin Ala Leu Pro Gly Ser Thr 
145 150 155 160 

Pro Lys Arg Ala Tyr Tyr He Tyr Ser Gly Gly Glu Lys He Pro Leu 
165 170 175 

Val Leu Ser Arg Pro Leu Ser Ser Asn Val Ala Thr Leu Gin His Leu 
180 185 190 

Cys Arg Lys Thr Val Asn Gly His Leu Asp Ser Tyr Glu Lys Val Thr 
195 200 205 

Gin Leu Pro Gly Pro He Arg Glu Phe Leu Asp Gin Tyr Asp Ala Pro 
210 215 220 

Leu 
225 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1094 base pairs 

(B) TYPE: nucleic acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 



CTCCGGCTGG 


CCCCTTCTGT 


AGGATGGTAG 


CACACAACCA 


GGTGGCAGCC 


GACAATGCAG 


60 


TCTCCACAGC 


AGCAGAGCCC 


CGACGGCGGC 


CAGAACCTTC 


CTCCTCTTCC 


TCCTCCTCGC 


120 


CCGCGGCCCC 


CGCGCGCCCG 


CGGCCGTGCC 


CCGCGGTCCC 


GGCCCCGGCC 


CCCGGCGACA 


180 


CGCACTTCCG 


CACATTCCGT 


TCGCACGCCG 


ATTACCGGCG 


CATCACGCGC 


GCCAGCGCGC 


240 


TCCTGGACGC 


CTGCGGATTC 


TACTGGGGGC 


CCCTGAGCGT 


GCACGGGGCG 


CACGAGCGGC 


300 


TGCGCGCCGA 


GCCCGTGGGC 


ACCTTCCTGG 


TGCGCGACAG 


CCGCCAGCGG 


AACTGCTTTT 


360 


TCGCCCTTAG 


CGTGAAGATG 


GCCTCGGGAC 


CCACGAGCAT 


CCGCGTGCAC 


TTTCAGGCCG 


420 


GCCGCTTTCA 


CCTGGATGGC 


AGCCGCGAGA 


GCTTCGACTG 


CCTCTTCGAG 


CTGCTGGAGC 


480 
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ACTACGTGGC 


GGCGCCGCGC 


CGCATGCTGG 


GGGCCCCGCT 


GCGCCAGCGC 


CGCGTGCGGC 


540 


CGCTGCAGGA 


GCTGTGCCGC 


CAGCGCATCG 


TGGCCACCGT 


GGGCCGCGAG 


AACCTGGCTC 


600 


GCATCCCCCT 


CAACCCCGTC 


CTCCGCGACT 


ACCTGAGCTC 


CTTCCCCTTC 


CAGATTTGAC 


660 


CGGCAGCGCC 


CGCCGTGCAC 


GCAGCATTAA 


CTGGGATGCC 


GTGTTATTTT 


GTTATTACTT 


720 


GCCTGGAACC 


ATGTGGGTAC 


CCTCCCCGGC 


CTGGGTTGGA 


GGGAGCGGAT 


GGGTGTAGGG 


780 


GCGAGGCGCC 


TCCCGCCCTC 


GGCTGGAGAC 


GAGGCCGCAG 


ACCCCTTCTC 


ACCTCTTGAG 


840 


GGGGTCCTCC 


CCCTCCTGGT 


GCTCCCTCTG 


GGTCCCCCTG 


GTTGTTGTAG 


CAGCTTAACT 


900 


GTATCTGGAG 


CCAGGACCTG 


AACTCGCACC 


TCCTACCTCT 


TCATGTTTAC 


ATATACCCAG 


960 


TATCTTTGCA 


CAAACCAGGG 


GTTGGGGGAG 


GGTCTCTGGC 


TTTATTTTTC 


TGCTGTGCAG 


1020 


AATCCTATTT 


TATATTTTTT 


AAAGTCAGTT 


TAGGTAATAA 


ACTTTATTAT 


GAAAGTTTTT 


1080 


TTTTTTAAAA 


AAAA 










1094 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 211 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Val Ala His Asn Gin Val Ala Ala Asp Asn Ala Val Ser Thr Ala 
15 10 15 

Ala Glu Pro Arg Arg Arg Pro Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

Pro Ala Ala Pro Ala Arg Pro Arg Pro Cys Pro Ala Val Pro Ala Pro 
35 40 45 

Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ala Asp Tyr 
50 55 60 

Arg Arg lie Thr Arg Ala Ser Ala Leu Leu Asp Ala Cys Gly Phe Tyr 
65 70 75 80 

Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ala Glu 
85 90 95 
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Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys Phe 
100 105 110 

Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser He Arg Val 
115 120 125 

His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Ser Arg Glu Ser Phe 
130 135 140 

Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arg Arg 
145 150 155 160 

Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin Glu 
165 170 175 

Leu Cys Arg Gin Arg He Val Ala Thr Val Gly Arg Glu Asn Leu Ala 
180 185 190 

Arg He Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe Pro 
195 200 205 

Phe Gin He 
210 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2807 base pairs 

(B) TYPE: nucleic acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



GGAAACCGAG 


GCGGGGAGAC 


CAGGAGGCCT 


TGGCCTCAGA 


GCTTCAGAGT 


CGCGTGGCAG 


60 


CAAACAGAGA 


AACCTGTAGA 


GGGCAGTGTG 


CGTCACTTAG 


CTCAGGGAAG 


CTGCACGCGA 


120 


AACTCACCCG 


CCTTCATTCA 


TAAACATCGT 


CAGCTAGGCA 


CCTACTCCTG 


GGCTTTCAGG 


180 


ACAAACTGAA 


TCACGAAACC 


ACAGTGTCCT 


TAAAATAGGT 


CTGACCGCCT 


GAATCCCTGG 


240 


CCAAGGTGTG 


TACGGGGCAT 


GGGAGCCCTT 


GTGCAGAGAT 


GCTTGCAGGA 


GCCTTGAGGG 


300 


GCTCTGTAAG 


ACAGAGGCTA 


GGAAGACAAA 


GTTGGGGGCT 


ACAGCTTCTT 


GTCCTGCCCG 


360 


GGGCCTCAGT 


TTCTTCGGTT 


GCCCACGTAG 


GAGTGCAGAG 


AGTCCAGCCC 


CTGGGGACCC 


420 


AACCCAACCC 


CGCCCAGTTT 


CCGAGGAACT 


CGTCCGGGAG 


CGGGGGCGCC 


CCTCCCGCAC 


480 


CGCCTTAGGC 


TTCCTTTGAA 


GCCTCTGCGG 


TCAGGCCACC 


GCTTCCTGGG 


AAGCCCAAGC 


540 
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CAAGGCCAGG 


CCGAGTGGCC 


AACGGGAGGG 


GCCCGCGCGC 


GATTCTGGAG 


GAGGGCGGCG 


600 


GCCCCACAGG 


TCTCCAGGGC 


TGGCTAGCCG 


GGCTCCTAGA 


GCGGAGACTG 


CCAAGGCCTT 


660 


CGGGTCCTGG 


GCAGGAAGGA 


TCCTGGCAGG 


GAGGAGTTGC 


TTGGGGGGTG 


GGGGGGAAAG 


720 


GCTCCAGGCG 


CGGTGGAGCT 


CTGACCAGGA 


GAATGCACAC 


ACTCGGAGGG 


GAGGAGGCGT 


780 


GTCAGCCCCA 


AGCTAGCATC 


CCACCCGGGG 


AGCAGCGATG 


TGGGGCGAAG 


GTAGCCAGAG 


840 


CAAAAGAGCA 


GGCACCAGGT 


GACACGAAAC 


AGAAGATTCC 


GGGTAGAGCC 


AGAACCCCAG 


900 


AAGTCCCATT 


CAGGGAAGGT 


GCGAGGCGAG 


AACGAGTTAG 


GTGGACCCTC 


TCCAGGGGCA 


960 


GCCAAAGAAA 


TCTAAAGAGA 


ACCCGAAGGA 


CTTGCCGGAA 


AGAGAAACCG 


AAAGCGGCGG 


1020 


TGGGCGGGAT 


CGGTGGGCGG 


GGCCTCCCTG 


GTTTAAGAGC 


TTGATGCAGG 


GGCGGGCAGC 


1080 


AGCAGAGAGA 


ACTGCGGCCG 


TGGCAGCGGC 


ACGGCTCCCG 


GCCCCGGAGC 


ATGCGCGACA 


1140 


GCAGCCCCGG 


AACCCCCAGC 


CGCGGCGCCC 


CGCGTCCCGC 


CGCCAGGTGA 


GCCGAGGCAG 


1200 


CTGCGAAGGA 


GCAGGCGGGA 


GGGGATGGGA 


GGAAGGGGAG 


CAGAGCCTGG 


CAGGACTATC 


1260 


CTCGCAGACT 


GCATGGCGGG 


GTCGTGGATG 


CTATGCCTCT 


GGCGCCCGCC 


CCACCGGCTG 


1320 


GCCCAGGCGG 


CCCCTCGCGC 


GCGCGGGGCG 


CCGTCAGCCC 


CTCCTCTCCG 


GCCCTGAGCC 


1380 


CGGATCGTCC 


GCCCGGGTTC 


CAGTTCCCGG 


CGTGGCCAGT 


AGGCGGCAAC 


CGCGAGGCGG 


1440 


CAAGCCACCC 


AGCGGGGACG 


GCCTGGAGTC 


GGGCCCCTCT 


CCACGCCCCC 


TTCTCCACGC 


1500 


GCGCGGGGAG 


GCAGGGCTCC 


ACCGCCAGTC 


TGGAAGGGTT 


CCACATACAG 


GAACGGCCTA 


1560 


CTTCGCAGAT 


GAGCCCACCG 


AGGCTCAGGC 


TCCGGGCGGA 


TTCTGCGTGT 


CACCCTCGCT 


1620 


CCTTGGGGTC 


CGCTGGCCGG 


CCTGTGCCAC 


CCGGACGCCC 


GGTTCACTGC 


CTCTGTCTCC 


1680 


CCCATCAGCG 


CAGCCCCGGA 


CGCTATGGCC 


CACCCCTCCA 


GCTGGCCCCT 


CGAGTAGGAT 


1740 


GGTAGCACGT 


AACCAGGTGG 


AAGCCGACAA 


TGCGATCTCC 


CCGGCATCAG 


AGCCCCGACG 


1800 


GCGGCCAGAG 


CCATCCTCGT 


CCTCGTCTTC 


GTCCTCGCCG 


GCGGCCCCGG 


CGCGTCCCCG 


1860 


GCCCTGCCCG 


GTGGTCCCGG 


CCCCGGCTCC 


GGGCGACACT 


CACTTCCGCA 


CCTTCCGCTC 


1920 


CCACTCTGAT 


TACCGGCGCA 


TCACGCGGAC 


CAGCGCTCTC 


CTGGACGCCT 


GCGGCTTCTA 


1980 


CTGGGGACCC 


CTGAGCGTGC 


ATGGGGCGCA 


CGAACGGCTG 


CGTTCCGAAC 


CCGTGGGCAC 


2040 


CTTCTTGGTG 


CGCGACAGTC 


GCCAGCGGAA 


CTGCTTCTTC 


GCGCTCAGCG 


TGAAGATGGC 


2100 


TTCGGGCCCC 


ACGAGCATTC 


GTGTGCACTT 


CCAGGCCGGC 


CGCTTCCACC 


TGGACGGCAA 


2160 


CCGCGAGACC 


TTCGACTGCC 


TCTTCGAGCT 


GCTGGAGCAC 


TACGTGGCGG 


CGCCGCGCCG 


2220 


CATGTTGGGG 


GCCCCACTGC 


GCCAGCGCCG 


CGTGCGGCCG 


CTGCAGGAGC 


TGTGTCGCCA 


2280 


GCGCATCGTG 


GCCGCCGTGG 


GTCGCGAGAA 


CCTGGCACGC 


ATCCCTCTTA 


ACCCGGTACT 


2340 


CCGTGACTAC 


CTGAGTTCCT 


TCCCCTTCCA 


GATCTGACCG 


GCTGCCGCCG 


TGCCCGCAGA 


2400 


ATTAAGTGGG 


AGCGCCTTAT 


TATTTCTTAT 


TATTAATTAT 


TATTATTTTT 


CTGGAACCAC 


2460 


GTGGGAGCCC 


TCCCCGCCTA 


GGTCGGAGGG 


AGTGGGTGTG 


GAGGGTGAGA 


TCCCTCCCAC 


2520 


TTCTGGCTGG 


AGACCTTATC 


CCGCCTCTCG 


GGGGGCCTCC 


CCTCCTGGTG 


CTCCCTCCCG 


2580 


GTCCCCCTGG 


TTGTAGCAGC 


TTGTGTCTGG 


GGCCAGGACC 


TGAACTCCAC 


GCCTACCTCT 


2640 


CCATGTTTAC 


ATGTTCCCAG 


TATCTTTGCA 


CAAACCAGGG 


GTGGGGGAGG 


GTCTCTGGCT 


2700 


TCATTTTTCT 


GCTGTGCAGA 


ATATTCTATT 


TTATATTTTT 


ACATCCAGTT 


TAGATAATAA 


2760 


ACTTTATTAT 


GAAAGTTTTT 


TTTTTTAAAG 


AAACAAAGAT 


TTCTAGA 




2807 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 212 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Val Ala Arg Asn Gin Val Glu Ala Asp Asn Ala lie Ser Pro Ala 
15 10 15 

Ser Glu Pro Arg Arg Arg Pro Glu Pro Ser Ser Ser Ser Ser Ser Ser 
20 25 30 

Ser Pro Ala Ala Pro Ala Arg Pro Arg Pro Cys Pro Val Val Pro Ala 
35 40 45 

Pro Ala Pro Gly Asp Thr His Phe Arg Thr Phe Arg Ser His Ser Asp 
50 55 60 

Tyr Arg Arg lie Thr Arg Thr Ser Ala Leu Leu Asp Ala Cys Gly Phe 
65 70 75 80 

Tyr Trp Gly Pro Leu Ser Val His Gly Ala His Glu Arg Leu Arg Ser 
85 90 95 

Glu Pro Val Gly Thr Phe Leu Val Arg Asp Ser Arg Gin Arg Asn Cys 
100 105 110 

Phe Phe Ala Leu Ser Val Lys Met Ala Ser Gly Pro Thr Ser lie Arg 
115 120 125 

Val His Phe Gin Ala Gly Arg Phe His Leu Asp Gly Asn Arg Glu Thr 
130 135 140 

Phe Asp Cys Leu Phe Glu Leu Leu Glu His Tyr Val Ala Ala Pro Arg 
145 150 155 160 

Arg Met Leu Gly Ala Pro Leu Arg Gin Arg Arg Val Arg Pro Leu Gin 
165 170 175 

Glu Leu Cys Arg Gin Arg lie Val Ala Ala Val Gly Arg Glu Asn Leu 
180 185 190 
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Ala Arg lie Pro Leu Asn Pro Val Leu Arg Asp Tyr Leu Ser Ser Phe 
195 200 205 



Pro Phe Gin lie 
210 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1611 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 263.. 1529 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

CGAATTCCGG GCGGGCTGTG TGAGTCTGTG AGTGGAAGGC GCGCCGGCTC TTTTGTCTGA 60 

GTGTGACCCG GTGGCTTTGT TCCAGGCATT CCGGTGATTT CCTCCGGGCA GTCCGCAGAA 120 

GCCGCAGCGG CCGCCCGCGC TCTCTCTGCA GTCTCCACAC CCGGGAGAGC CTGAGCCCGC 180 

GTCACGCCCC TCAGCCCCCG CTGAGTCCCT TCTCTGTTGT CGCGTCCGAA TCGAGTTCCC 240 

GGAATCAGAC GGTGCCCCAT AG ATG GCC AGC TTT CCC CCG AGG GTT AAC GAG 292 



Met Ala 



Ser Phe Pro Pro Arg Val Asn Glu 
5 10 



1 



AAA GAG ATC GTG AGA TCA CGT ACT ATA GGG GAA CTC TTG GCT CCA GCA 



340 



Lys Glu lie Val Arg Ser Arg Thr lie Gly Glu Leu Leu Ala Pro Ala 
15 20 25 



GCT CCT TTT GAC AAG AAA TGT GGT GGT GAG AAC TGG ACG GTT GCT TTT 
Ala Pro Phe Asp Lys Lys Cys Gly Gly Glu Asn Trp Thr Val Ala Phe 
30 35 40 



388 
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GCT CCT GAT GOT TCC TAG TTT GCG TGG TCA CAA GGA TAT CGC ATA GTG 436 
Ala Pro Asp Gly Ser Tyr Phe Ala Trp Ser Gin Gly Tyr Arg lie Val 
45 50 55 

AAG CTT GTC CCG TGG TCC CAG TGC CGT AAG AAC TTT CTT TTG CAT GGT 484 
Lys Leu Val Pro Trp Ser Gin Cys Arg Lys Asn Phe Leu Leu His Gly 
60 65 70 

TCC AAA AAT GTT ACC AAT TCA AGC TGT CTA AAA TTG GCA AGA CAA AAC 532 
Ser Lys Asn Val Thr Asn Ser Ser Cys Leu Lys Leu Ala Arg Gin Asn 
75 80 85 90 

AGT AAT GGT GGT CAG AAA AAC AAG CCT CCT GAG CAC GTT ATA GAC TGT 580 
Ser Asn Gly Gly Gin Lys Asn Lys Pro Pro Glu His Val lie Asp Cys 
95 100 105 

GGA GAC ATA GTC TGG AGT CTT GCT TTT GGG TCT TCA GTT CCA GAA AAA 62 8 

Gly Asp lie Val Trp Ser Leu Ala Phe Gly Ser Ser Val Pro Glu Lys 
110 115 120 

CAG AGT CGT TGC GTT AAT ATA GAA TGG CAT CGG TTC CGA TTT GGA CAG 676 
Gin Ser Arg Cys Val Asn lie Glu Trp His Arg Phe Arg Phe Gly Gin 
125 130 135 

GAT CAG CTA CTC CTT GCC ACA GGA TTA AAC AAT GGT CGC ATC AAA ATC 724 
Asp Gin Leu Leu Leu Ala Thr Gly Leu Asn Asn Gly Arg lie Lys He 
140 145 150 

TGG GAT GTA TAT ACA GGA AAA CTC CTC CTT AAT TTG GTA GAC CAC ATT 772 
Trp Asp Val Tyr Thr Gly Lys Leu Leu Leu Asn Leu Val Asp His lie 
155 160 165 170 

GAA ATG GTT AGA GAT TTA ACT TTT GCT CCA GAT GGG AGC TTA CTC CTT 820 
Glu Met Val Arg Asp Leu Thr Phe Ala Pro Asp Gly Ser Leu Leu Leu 
175 180 185 

GTA TCA GCT TCA AGA GAC AAA ACT CTA AGA GTG TGG GAC CTG AAA GAT 868 
Val Ser Ala Ser Arg Asp Lys Thr Leu Arg Val Trp Asp Leu Lys Asp 
190 195 200 

GAT GGA AAC ATG GTG AAA GTA TTG CGG GCA CAT CAG AAT TGG GTG TAG 916 
Asp Gly Asn Met Val Lys Val Leu Arg Ala His Gin Asn Trp Val Tyr 
205 210 215 

AGT TGT GCA TTC TCT CCC GAC TGT TCT ATG CTG TGT TCA GTG GGC GCC 964 
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Ser Cys Ala Phe Ser Pro Asp Cys Ser Met Leu Cys Ser Val Gly Ala 
220 225 230 

AGT AAA GCA GTT TTC CTT TGG AAT ATG GAT AAA TAG ACC ATG ATT AGG 1012 
Ser Lys Ala Val Phe Leu Trp Asn Met Asp Lys Tyr Thr Met lie Arg 
235 240 245 250 

AAG CTG GAA GGT CAT CAC CAT GAT GTT GTA GCT TGT GAC TTT TCT CCT 1060 
Lys Leu Glu Gly His His His Asp Val Val Ala Cys Asp Phe Ser Pro 
255 260 265 

GAT GGA GCA TTG CTA GCT ACT GCA TCC TAT GAC ACT CGT GTG TAT GTC 1108 
Asp Gly Ala Leu Leu Ala Thr Ala Ser Tyr Asp Thr Arg Val Tyr Val 
270 275 280 

TGG GAT CCA CAC AAT GGA GAC CTT CTG ATG GAG TTT GGG CAC CTG TTT 1156 
Trp Asp Pro His Asn Gly Asp Leu Leu Met Glu Phe Gly His Leu Phe 
285 290 295 

CCC TCG CCC ACT CCA ATA TTT GCT GGA GGA GCA AAT GAC CGA TGG GTG 1204 
Pro Ser Pro Thr Pro He Phe Ala Gly Gly Ala Asn Asp Arg Trp Val 
300 305 310 

AGA GCT GTG TCT TTC AGT CAT GAT GGA CTG CAT GTT GCC AGC CTT GCT 1252 
Arg Ala Val Ser Phe Ser His Asp Gly Leu His Val Ala Ser Leu Ala 
315 320 325 330 

GAT GAT AAA ATG GTG AGG TTC TGG AGA ATC GAT GAG GAT TGT CCG GTA 1300 
Asp Asp Lys Met Val Arg Phe Trp Arg He Asp Glu Asp Cys Pro Val 
335 340 345 

CAA GTT GCA CCT TTG AGC AAT GGT CTT TGC TGT GCC TTT TCT ACT GAT 1348 
Gin Val Ala Pro Leu Ser Asn Gly Leu Cys Cys Ala Phe Ser Thr Asp 
350 355 360 

GGC AGT GTT TTA GCT GCT GGG ACA CAT GAT GGA AGT GTG TAT TTT TGG 1396 
Gly Ser Val Leu Ala Ala Gly Thr His Asp Gly Ser Val Tyr Phe Trp 
365 370 375 

GCC ACT CCA AGG CAA GTC CCT AGC CTT CAA CAT ATA TGT CGC ATG TCA 1444 
Ala Thr Pro Arg Gin Val Pro Ser Leu Gin His He Cys Arg Met Ser 
380 385 390 

ATC CGA AGA GTG ATG TCC ACC CAA GAA GTC CAA AAA CTG CCT GTT CCT 1492 
He Arg Arg Val Met Ser Thr Gin Glu Val Gin Lys Leu Pro Val Pro 
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395 400 405 410 

TCC AAA ATA TTG GCG TTT CTC TCC TAG CGC GGT TAG A CTGAAGACTG 1539 
Ser Lys lie Leu Ala Phe Leu Ser Tyr Arg Gly * 
415 420 

CCTTTCCTGG TAGGCCTGCC AGACAGAGCG CCCTTTACAA GACACACCTC AAGCTTTACC 1599 

TCGTGCCGAA TT 1611 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 422 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Ala Ser Phe Pro Pro Arg Val Asn Glu Lys Glu lie Val Arg Ser 
15 10 15 

Arg Thr lie Gly Glu Leu Leu Ala Pro Ala Ala Pro Phe Asp Lys Lys 
20 25 30 

Cys Gly Gly Glu Asn Trp Thr Val Ala Phe Ala Pro Asp Gly Ser Tyr 
35 40 45 

Phe Ala Trp Ser Gin Gly Tyr Arg lie Val Lys Leu Val Pro Trp Ser 
50 55 60 

Gin Cys Arg Lys Asn Phe Leu Leu His Gly Ser Lys Asn Val Thr Asn 
65 70 75 80 

Ser Ser Cys Leu Lys Leu Ala Arg Gin Asn Ser Asn Gly Gly Gin Lys 
85 90 95 

Asn Lys Pro Pro Glu His Val lie Asp Cys Gly Asp lie Val Trp Ser 
100 105 110 

Leu Ala Phe Gly Ser Ser Val Pro Glu Lys Gin Ser Arg Cys Val Asn 
115 120 125 
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lie Glu Trp His Arg Phe Arg Phe Gly Gin Asp Gin Leu Leu Leu Ala 
130 135 140 

Thr Gly Leu Asn Asn Gly Arg lie Lys He Trp Asp Val Tyr Thr Gly 
145 150 155 160 

Lys Leu Leu Leu Asn Leu Val Asp His He Glu Met Val Arg Asp Leu 
165 170 175 

Thr Phe Ala Pro Asp Gly Ser Leu Leu Leu Val Ser Ala Ser Arg Asp 
180 185 190 

Lys Thr Leu Arg Val Trp Asp Leu Lys Asp Asp Gly Asn Met Val Lys 
195 200 205 

Val Leu Arg Ala His Gin Asn Trp Val Tyr Ser Cys Ala Phe Ser Pro 
210 215 220 

Asp Cys Ser Met Leu Cys Ser Val Gly Ala Ser Lys Ala Val Phe Leu 
225 230 235 240 

Trp Asn Met Asp Lys Tyr Thr Met lie Arg Lys Leu Glu Gly His His 
245 250 255 

His Asp Val Val Ala Cys Asp Phe Ser Pro Asp Gly Ala Leu Leu Ala 
260 265 270 

Thr Ala Ser Tyr Asp Thr Arg Val Tyr Val Trp Asp Pro His Asn Gly 
275 280 285 

Asp Leu Leu Met Glu Phe Gly His Leu Phe Pro Ser Pro Thr Pro He 
290 295 300 

Phe Ala Gly Gly Ala Asn Asp Arg Trp Val Arg Ala Val Ser Phe Ser 
305 310 315 320 

His Asp Gly Leu His Val Ala Ser Leu Ala Asp Asp Lys Met Val Arg 
325 330 335 

Phe Trp Arg He Asp Glu Asp Cys Pro Val Gin Val Ala Pro Leu Ser 
340 345 350 

Asn Gly Leu Cys Cys Ala Phe Ser Thr Asp Gly Ser Val Leu Ala Ala 
355 360 365 
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Gly Thr His Asp Gly Ser Val Tyr Phe Trp Ala Thr Pro Arg Gin Val 
370 375 380 

Pro Ser Leu Gin His He Cys Arg Met Ser He Arg Arg Val Met Ser 
385 390 395 400 

Thr Gin Glu Val Gin Lys Leu Pro Val Pro Ser Lys He Leu Ala Phe 
405 410 415 

Leu Ser Tyr Arg Gly * 
420 

(2) INFORMATION FOR SEQ ID NO; 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 783 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

CTGTCTTCCT CCGCAGCGCG AGGCTGGGTA CAGGGTCTAT TGTCTGTGGT TGACTCCGTA 60 

CTTTGGTCTG AGGCCTTCGG GAGCTTTCCC GAGGCAGTTA GCAGAAGCCG CAGCGACCGC 120 

CCCCGCCCGT CTCCTCTGTC CCTGGGCCCG GGAGACAAAC TTGGCGTCAC GCCCTCAGCG 180 

GTCGCCACTC TCTTCTCTGT TGTTGGGTCC GCATCGTATT CCCGGAATCA GACGGTGCCC 240 

CATAGATGGC CAGCTTTCCC CCGAGGGTCA ACGAGAAAGA GATCGTGAGA TCACGTACTA 300 

TAGGTGAACT TTTAGCTCCT GCAGCTCCTT TTGACAAGAA ATGTGGTCGT GAAAATTGGA 360 

CTGTTGCTTT TGCTCCAGAT GGTTCATACT TTGCTTGGTC ACAAGGACAT CGCACAGTAA 420 

AGCTTGTTCC GTGGTCCCAG TGCCTTCAGA ACTTTCTCTT GCATGGCACC AAGAATGTTA 480 

CCAATTCAAG CAGTTTAAGA TTGCCAAGAC AAAATAGTGA TGGTGGTCAG AAAAATAAGC 540 

CTCGTGACAT ATTATAGACT GTGGAGATAT AGTCTGGAGT CTTGCTTTTG GGTCATCAGT 600 
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TCCAGAAAAA CAGAGTCGCT GTGTAAATAT AGAATGGCAT CGCTTCAGAT TTGGACAAGA 660 

TCAGCTACTT CTTGCTACAG GGTTGAACAA TGGGCGTATC AAAATATGGG ATGTATATCA 720 

GGAAACTCCT CCTTAACTTG GTAGATCATA CTGAAGTGGT CAGAGATTTA ACTTTTGCTC 780 

CAG 783 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1122 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

CTCTGTATGT CTGAATGAAG CTATAACATT TGCCTTTTTA TTGCAGGTTT TCCTTTGGAA 60 

TATGGATAAA TACACCATGA TACGGAAACT AGAAGGACAT CACCATGATG TGGTAGCTTG 12 0 

TGACTTTTCT CCTGATGGAG CATTACTGGC TACTGCATCT TATGATACTC GAGTATATAT 180 

CTGGGATCCA CATAATGGAG ACATTCTGAT GGAATTTGGG CACCTGTTTC CCCCACCTAC 240 

TCCAATATTT GCTGGAGGAG CAAATGACCG GTGGGTACGA TCTGTATCTT TTAGCCATGA 300 

TGGACTGCAT GTTGCAAGCC TTGCTGATGA TAAAATGGTG AGGTTCTGGA GAATTGATGA 360 

GGATTATCCA GTGCAAGTTG CACCTTTGAG CAATGGTCTT TGCTGTGCCT TCTCTACTGA 42 0 

TGGCAGTGTT TTAGCTGCTG GGACACATGA CGGAAGTGTG TATTTTTGGG CCACTCCACG 480 

GCAGGTCCCT AGCCTGCAAC ATTTATGTCG CATGTCAATC CGAAGAGTGA TGCCCACCCA 540 

AGAAGTTCAG GAGCTGCCGA TTCCTTCCAA GCTTTTGGAG TTTCTCTCGT ATCGTATTTA 600 

GAAGATTCTG CCTTCCCTAG TAGTAGGGAC TGACAGAATA CACTTAACAC AAACCTCAAG 660 

CTTTACTGAC TTCAATTATC TiGTTTTTAAA GACGTAGAAG ATTTATTTAA TTTGATATGT 720 
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TCTTGTACTG CATTTTGATC AGTTGAGCTT TTAAAATATT ATTTATAGAC AATAGAAGTA 780 

TTTCTGAACA TATCAAATAT AAATTTTTTT AAAGATCTAA CTGTGAAAAC ATACATACCT 840 

GTACATATTT AGATATAAGC TGCTATATGT TGAATGGACC CTTTTGCTTT TCTGATTTTT 900 

AGTTCTGACA TGTATATATT GCTTCAGTAG AGCCACAATA TGTATCTTTG CTGTAAAGTG 960 

CAAGGAAATT TTAAATTCTG GGACACTGAG TTAGATGGTA AATACTGACT TACGAAAGTT 1020 

GAATTGGGTG AGGCGGGCAA ATCACCTGAG GTCAGCAGTT TGAGACTAGC CTGGCAAACA 1080 

TGATGAAACC CTGTCTCTAC TAAAAATACA AAAAAAAAAA AA 1122 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2537 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 422.. 2029 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

CGGCACGAGC CGGGCTCCGT CCGGAGGAAG CGAGGCTGCG CCGCCGGCCC GGCAGGAGCG 60 

GAGGACGGGA GCGCGGGCGG TCGCGCTCGC CCTGTCGCTG ACTGCGCTGC CCCGGCCCAT 120 

CCTTGCCTGG CCGCAGGTGC CCTGGATGAG GCCGCCGCGC GTGTCCCGGC CGCTGAGTGT 180 

CCCCCGCGGT CGCCCGGCGC CTGCCCTCAA GCGGCCGCCT CTCCTTGCCC GGGTCCCCGT 240 

TTTCCCCCGG CGCAGTCCTC CTCCGGTGGG CGCCTCCGCA CCTCGGCGCA GGCGGCACGG 3 00 

CCCTCGGGCC GGGATGGATC CiGCCGGGAAG AGGAAGACAA GCCGGGGCGT TGAGCCCCTG 3 60 
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CGCACGGTGC CGCCGCGCGT AGTGGGAGCT TACTCGCAGT AGGCTCTCGC TCTTCTAATC 420 

A ATG GAT AAA GTG GGG AAA ATG TGG AAC AAC TTA AAA TAG AGA TGC 466 
Met Asp Lys Val Gly Lys Met Trp Asn Asn Leu Lys Tyr Arg Cys 
15 10 15 

GAG AAT CTC TTC AGC CAC GAG GGA GGA AGO CGT AAT GAG AAC GTG GAG 514 
Gin Asn Leu Phe Ser His Glu Gly Gly Ser Arg Asn Glu Asn Val Glu 
20 25 30 

ATG AAC CCC AAC AGA TGT CCG TCT GTC AAA GAG AAA AGC ATC AGT CTG 562 
Met Asn Pro Asn Arg Cys Pro Ser Val Lys Glu Lys Ser lie Ser Leu 
35 40 45 

GGA GAG GCA GCT CCC CAG CAA GAG AGC AGT CCC TTA AGA GAA AAT GTT 610 
Gly Glu Ala Ala Pro Gin Gin Glu Ser Ser Pro Leu Arg Glu Asn Val 
50 55 60 

GCC TTA CAG CTG GGA CTG AGC CCT TCC AAG ACC TTT TCC AGG CGG AAC 658 
Ala Leu Gin Leu Gly Leu Ser Pro Ser Lys Thr Phe Ser Arg Arg Asn 
65 70 75 

CAA AAC TGT GCC GCA GAG ATC CCT CAA GTG GTT GAA ATC AGC ATC GAG 706 
Gin Asn Cys Ala Ala Glu lie Pro Gin Val Val Glu lie Ser lie Glu 
80 85 90 95 

AAA GAC AGT GAC TCG GGT GCC ACC CCA GGA ACG AGG CTT GCA CGG AGA 754 
Lys Asp Ser Asp Ser Gly Ala Thr Pro Gly Thr Arg Leu Ala Arg Arg 
100 105 110 

GAC TCC TAC TCG CGG CAC GCC CCG TGG GGA GGA AAG AAG AAA CAT TCC 802 
Asp Ser Tyr Ser Arg His Ala Pro Trp Gly Gly Lys Lys Lys His Ser 
115 120 125 

TGT TCC ACA AAG ACC CAG AGT TCA TTG GAT ACC GAG AAA AAG TTT GGT 850 
Cys Ser Thr Lys Thr Gin Ser Ser Leu Asp Thr Glu Lys Lys Phe Gly 
130 135 140 

AGA ACT CGA AGC GGC CTT CAG AGG CGA GAG CGG CGC TAT GGA GTC AGC 898 
Arg Thr Arg Ser Gly Leu Gin Arg Arg Glu Arg Arg Tyr Gly Val Ser 
145 150 155 

TCC ATG CAG GAC ATG GAC AGC GTT TCT AGC CGC GCG GTC GGG AGC CGC 946 
Ser Met Gin Asp Met Asp Ser Val Ser Ser Arg Ala Val Gly Ser Arg 
160 165 170 175 
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TCC CTG AGG CAG AGG CTC CAG GAC ACG GTG GGT TTG TGT TTT CCC ATG 994 
Ser Leu Arg Gin Arg Leu Gin Asp Thr Val Gly Leu Cys Phe Pro Met 
180 185 190 



AGA ACT TAG AGC AAG CAG TCA AAG CCA CTC TTT TCC AAT AAA AGA AAA 1042 
Arg Thr Tyr Ser Lys Gin Ser Lys Pro Leu Phe Ser Asn Lys Arg Lys 
195 200 205 



ATA CAT CTT TCT GAA TTA ATG CTG GAG AAA TGC CCT TTT CCT GCT GGC 1090 
lie His Leu Ser Glu Leu Met Leu Glu Lys Cys Pro Phe Pro Ala Gly 
210 215 220 



TCG GAT TTA GCA CAA AAG TGG CAT TTG ATT AAA CAG CAT ACC GCC CCT 1138 
Ser Asp Leu Ala Gin Lys Trp His Leu lie Lys Gin His Thr Ala Pro 
225 230 235 



GTG AGC CCA CAC TCA ACA TTT TTT GAT AC A TTT GAT CCA TCA CTG GTG 1186 
Val Ser Pro His Ser Thr Phe Phe Asp Thr Phe Asp Pro Ser Leu Val 
240 245 250 255 



TCT ACA GAA GAT GAA GAA GAT AGG CTT CGC GAG AGA AGA CGG CTT AGT 
Ser Thr Glu Asp Glu Glu Asp Arg Leu Arg Glu Arg Arg Arg Leu Ser 
260 265 270 



1234 



ATC GAA GAA GGG GTG GAT CCC CCT CCC AAC GCA CAA ATA CAC ACC TTT 
lie Glu Glu Gly Val Asp Pro Pro Pro Asn Ala Gin lie His Thr Phe 
275 280 285 



1282 



GAA GCT ACT GCA CAG GTC AAC CCA TTG TAT AAG CTG GGA CCA AAG TTA 
Glu Ala Thr Ala Gin Val Asn Pro Leu Tyr Lys Leu Gly Pro Lys Leu 
290 295 300 



1330 



GCT CCT GGG ATG ACA GAG ATA AGT GGA GAT GGT TCT GCA ATT CCA CAA 1378 
Ala Pro Gly Met Thr Glu lie Ser Gly Asp Gly Ser Ala lie Pro Gin 
305 310 315 

GCA ATT GTG ACT CAG AAG AGG ATT CAA CCA CCC TAT GTC TGC AGT CAC 1426 
Ala lie Val Thr Gin Lys Arg lie Gin Pro Pro Tyr Val Cys Ser His 
320 325 330 335 



GGA GGC AGA AGC AGC GCC AGG TGT CCG GGG ACA GCC ACG CGC ACG TTA 1474 
Gly Gly Arg Ser Ser Ala Arg Cys Pro Gly Thr Ala Thr Arg Thr Leu 
340 345 350 



GCA GAC AGG GAG CTT GGA AAG TTC ATA CGC AGA TCG ATT ACA TAC ACT 
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Ala Asp Arg Glu Leu Gly Lys Phe lie Arg Arg Ser lie Thr Tyr Thr 
355 360 365 

GCC TOG TGC CAG ATT TGC TTC AGA TCA CAG GGA ATC OCT GTT ACT GGG 157 0 

Ala Ser Cys Gin lie Cys Phe Arg Ser Gin Gly lie Pro Val Thr Gly 
370 375 380 

GCG TGA TGG AGO GAT ACG AGG CCG AAG CCC TTC TAG AAG GGA AAC CGG 1618 
Ala * Trp Thr Asp Thr Arg Pro Lys Pro Phe * Lys Gly Asn Arg 
385 390 395 

AAG GCA CGT TCT TGC TCA GGG ACT CTG CAC AGG AGG ACT ACC TCT TCT 1666 
Lys Ala Arg Ser Cys Ser Gly Thr Leu His Arg Arg Thr Thr Ser Ser 
400 405 410 415 

CTG TGA GCT TCC GCC GCT ACA AC A GGT CTC TGC ACG CCC GGA TCG AGC 1714 
Leu * Ala Ser Ala Ala Thr Thr Gly Leu Cys Thr Pro Gly Ser Ser 
420 425 430 

AGT GGA ACC ACA ACT TCA GCT TCG ATG CCC ATG ACC CCT GCG TGT TTC 1762 
Ser Gly Thr Thr Thr Ser Ala Ser Met Pro Met Thr Pro Ala Cys Phe 
435 440 445 

ACT CCT CCA CGT CAC GGG GCT TCT CGA ACA CTA TAA AGA CCC CAG CTC 1810 
Thr Pro Pro Arg His Gly Ala Ser Arg Thr Leu * Arg Pro Gin Leu 
450 455 460 

TTG CAT GTT TTT TGA ACC GTT GCT AAC GAT ATC ACT GAA TAG AAC TTT 1858 
Leu His Val Phe * Thr Val Ala Asn Asp lie Thr Glu * Asn Phe 
465 470 475 

CCC TTT CAG CCT GCA GTA TAT CTG CCG CGC AGT GAT CTG CAG ATG CAC 1906 
Pro Phe Gin Pro Ala Val Tyr Leu Pro Arg Ser Asp Leu Gin Met His 
480 485 490 495 

TAC GTA TGA TGG GAT TGA CGG GCT CCC GCT ACC GTC GAT GTT ACA GGA 1954 
Tyr Val * Trp Asp * Arg Ala Pro Ala Thr Val Asp Val Thr Gly 
500 505 510 

TTT TTT AAA AGA GTA TCA TTA TAA ACA AAA AGT TAG GGT TCG CTG GTT 2002 
Phe Phe Lys Arg Val Ser Leu * Thr Lys Ser * Gly Ser Leu Val 
515 520 525 

AGA ACG AGA CCA GTC AAA GCA AAG TAACTCCTGT CCCCAAAGGG CACTAACTAA 2 056 

Arg Thr Arg Pro Val Lys Ala Lys 
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530 535 
GTCTGCTCCT CCCGTGCATC GAACTGCACC CATAGGAGGC AGTCAGCTGC TAGGATTTCC 2116 
CACCCAGAAT GGGAGCTTAG TCATTAGCCT CTGCCCTATG GGGTCCGCTG TTCCTCAGAC 2176 
AAAGGTGCCT AGGGACAGCA AGATGGCTTG CAGGTGTTCG GTGGGCTGTG ACAACTGAGG 2236 

GAGGCAACTC TGGGGCATTT GCTATGAAGA ATTCTATTTC TTACCGAAGA ACAAATTATT 2296 

AATATTGGAT GGGTATTTCA ATAGTGTGAC TAATGTTTGA AATTATTTTT TCTAAGAATT 23 56 

TTTCTATAAC CTTCAGAAAA AGTAGTGATG TTTGTAGTTA CTATAAATCA AGCTTTGAAA 2416 

GTTCAAAACA AACAAGTTAA ATAAAAGACT ACCTTCCTTT TAGAGAAAAC AAATGCAAGT 2476 

TTTCCCAGCC ACAGGCATTG TGCACTGTTA ATGTTGCTTG TTATCAGCTC CTTTCTCCTC 2536 

C 2537 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 5 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Asp Lys Val Gly Lys Met Trp Asn Asn Leu Lys Tyr Arg Cys Gin 
15 10 15 

Asn Leu Phe Ser His Glu Gly Gly Ser Arg Asn Glu Asn Val Glu Met 
20 25 30 

Asn Pro Asn Arg Cys Pro Ser Val Lys Glu Lys Ser lie Ser Leu Gly 
35 40 45 

Glu Ala Ala Pro Gin Gin Glu Ser Ser Pro Leu Arg Glu Asn Val Ala 
50 55 60 

Leu Gin Leu Gly Leu Ser Pro Ser Lys Thr Phe Ser Arg Arg Asn Gin 
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65 



70 



75 



80 



Asn Cys Ala Ala Glu lie Pro Gin Val Val Glu lie Ser lie Glu Lys 
85 90 95 

Asp Ser Asp Ser Gly Ala Thr Pro Gly Thr Arg Leu Ala Arg Arg Asp 
100 105 110 

Ser Tyr Ser Arg His Ala Pro Trp Gly Gly Lys Lys Lys His Ser Cys 
115 120 125 

Ser Thr Lys Thr Gin Ser Ser Leu Asp Thr Glu Lys Lys Phe Gly Arg 
130 135 140 

Thr Arg Ser Gly Leu Gin Arg Arg Glu Arg Arg Tyr Gly Val Ser Ser 
145 150 155 160 

Met Gin Asp Met Asp Ser Val Ser Ser Arg Ala Val Gly Ser Arg Ser 
165 170 175 

Leu Arg Gin Arg Leu Gin Asp Thr Val Gly Leu Cys Phe Pro Met Arg 
180 185 190 

Thr Tyr Ser Lys Gin Ser Lys Pro Leu Phe Ser Asn Lys Arg Lys He 
195 200 205 

His Leu Ser Glu Leu Met Leu Glu Lys Cys Pro Phe Pro Ala Gly Ser 
210 215 220 

Asp Leu Ala Gin Lys Trp His Leu He Lys Gin His Thr Ala Pro Val 
225 230 235 240 

Ser Pro His Ser Thr Phe Phe Asp Thr Phe Asp Pro Ser Leu Val Ser 
245 250 255 

Thr Glu Asp Glu Glu Asp Arg Leu Arg Glu Arg Arg Arg Leu Ser He 
260 265 270 

Glu Glu Gly Val Asp Pro Pro Pro Asn Ala Gin lie His Thr Phe Glu 
275 280 285 

Ala Thr Ala Gin Val Asn Pro Leu Tyr Lys Leu Gly Pro Lys Leu Ala 
290 295 300 



Pro Gly Met Thr Glu He Ser Gly Asp Gly Ser Ala He Pro Gin Ala 
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305 310 315 320 

lie Val Thr Gin Lys Arg lie Gin Pro Pro Tyr Val Cys Ser His Gly 
325 330 335 

Gly Arg Ser Ser Ala Arg Cys Pro Gly Thr Ala Thr Arg Thr Leu Ala 
340 345 350 

Asp Arg Glu Leu Gly Lys Phe lie Arg Arg Ser lie Thr Tyr Thr Ala 
355 360 365 

Ser Cys Gin lie Cys Phe Arg Ser Gin Gly lie Pro Val Thr Gly Ala 
370 375 380 

* Trp Thr Asp Thr Arg Pro Lys Pro Phe * Lys Gly Asn Arg Lys 
385 390 395 400 

Ala Arg Ser Cys Ser Gly Thr Leu His Arg Arg Thr Thr Ser Ser Leu 
405 410 415 

* Ala Ser Ala Ala Thr Thr Gly Leu Cys Thr Pro Gly Ser Ser Ser 

420 425 430 

Gly Thr Thr Thr Ser Ala Ser Met Pro Met Thr Pro Ala Cys Phe Thr 
435 440 445 

Pro Pro Arg His Gly Ala Ser Arg Thr Leu * Arg Pro Gin Leu Leu 
450 455 460 

His Val Phe * Thr Val Ala Asn Asp lie Thr Glu * Asn Phe Pro 
465 470 475 480 

Phe Gin Pro Ala Val Tyr Leu Pro Arg Ser Asp Leu Gin Met His Tyr 
485 490 495 

Val * Trp Asp * Arg Ala Pro Ala Thr Val Asp Val Thr Gly Phe 
500 505 510 

Phe Lys Arg Val Ser Leu * Thr Lys Ser * Gly Ser Leu Val Arg 
515 520 525 

Thr Arg Pro Val Lys Ala Lys 
530 535 
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(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1221 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

GATTAAACAG CATACAGCTC CTGTGAGCCC ACATTCAACA TTTTTTGATA CTTTGATCCA 60 

TCTTTGGTTT CTACAGAAGA TGAAGAAGAT AGGCTTAGAG AGAGAAGGCG GCTTAGTATT 120 

GAAGAAGGGG TTGATCCCCC TCCCAATGCA CAAATACATA CATTTGAAGC TACTGCACAG 180 

GTTAATCCAT TATTAAACTG GGACCAAAAT TAGCTCCTGG AATGACTGAA ATAAGTGGGG 240 

ACAGTTCTGC AATTCCACAA GCTAATTGTG ACTCGGAAGA GGATACAACC ACCCTGTGTT 3 00 

GCAGTCACGG AGGCAGAAGC AGCGTCAGAT ATCTGGAGAC AGCCATACCC ATGTTAGCAG 360 

ACAGGGAGCT TGGAAAGTCC ACACACAGAT TGATTACATA CACTGCTTCG TGCCTGATTT 420 

GCTTCAAATT ACAGGGAATC CCTGTTACTG GGGAGTGATG GACCGTTATG AAGCAGAAGC 480 

CCTTCTCGAA GGGAAACCTG AAGGCACGTT TTTGCTCAGG GACTCTGCGC AAGAGGACTA 540 

CTTCTTCTCT GTGAGCTTCC GCCGATACAA CAGATCCCTG CATGCCCGAA TTGAGCAGTG 600 

GAATCACAAC TTTAGTTTCG ACGCCCATGA CCCGTGTGTA TTTCACTCCT CCACTGTAAC 660 

GGGACTTTTA GAACATTATA AAGATCCCAG TTCGTGCATG TTTTTTGAAC CATTGCTTAC 720 

TATATCACTA AATAGGACTT TCCCTTTTAG CCTGCAGTAT ATCTGTCGCG CGGTAATCTG 780 

CAGGTGCACT ACGTATGATG GAATTGATGG GCTCCCTCTA CCCTCAATGT TACAGGATTT 840 

TTTAAAAGAG TATCATTATA AACAAAAAGT TAGAGTTCGC TGGTTGGAAC GAGAACCAGT 900 

CAAGGCAAAG TAAACTCTCC GGTCCCCAAA GGGTGTTAAC TAGGTCCGCT TTCATGTGCA 960 
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TCAGACAGTA CACCTATAGC AAGCACACGT AGCAGTGTTA GGCTTTTTCA TACAGTATGT 1020 

AAGCTTAGTG TTAGTATCTG TCAGATGCTA CCTGCTGTTA CTTATTCAGA TAAACATGGT 1080 

GCCTATTGGA ACAATAGCGG ATAGAGCTAC AGGTGTTCAG TAAGACTACA AAAACATTTT 1140 

GCCTATTTCG CTAACAGTTT GGTTTTTAAT GGCTGTGGTA TTTGAGTGAG GCAACTCTGG 1200 

GGCATTTGTT ATGAAGAAAT G 1221 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 69 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 116.. 1330 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



GGCACGAGGC GGTGGTGGCG GCGGCGGGCG CGGCCGCGGC GGGGCGGGCG CGGAATGAAG 60 

GCCCACGGCC CTGGGGGCTG AGGCGCCCGC CGCCTGGGGC GGGCCGCGCG TCCTC ATG 118 

Met 
1 

GAG GCC GGA GAG GAG CCG CTG CTG CTG GCT GAA CTC AAG CCT GGG CGC 166 
Glu Ala Gly Glu Glu Pro Leu Leu Leu Ala Glu Leu Lys Pro Gly Arg 
5 10 15 

CCC CAC CAG TTC GAC TGG AAG TCA AGC TGC GAG ACC TGG AGC GTG GCC 214 
Pro His Gin Phe Asp Trp Lys Ser Ser Cys Glu Thr Trp Ser Val Ala 
20 25 30 

TTC TCG CCA GAC GGT TCC TGG TTC GCC TGG TCT CAA GGA CAC TGC GTG 262 
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Phe Ser Pro Asp Gly Ser Trp Phe Ala Trp Ser Gin Gly His Cys Val 
35 40 45 



GTC AAG CTG GTC CCC TGG CCC TTA GAG GAA CAG TTC ATC CCT AAA GGA 310 
Val Lys Leu Val Pro Trp Pro Leu Glu Glu Gin Phe lie Pro Lys Gly 
50 55 60 65 



TTC GAA GCC AAG AGC CGA AGC AGC AAG AAT GAG CCA AAA GGA CGG GGC 358 
Phe Glu Ala Lys Ser Arg Ser Ser Lys Asn Asp Pro Lys Gly Arg Gly 
70 75 80 



AGT CTG AAG GAG AAG ACG CTG GAC TGT GGC CAG ATT GTG TGG GGG CTG 
Ser Leu Lys Glu Lys Thr Leu Asp Cys Gly Gin lie Val Trp Gly Leu 
85 90 95 



406 



GCC TTC AGC CCG TGG CCC TCT CCA CCC AGC AGG AAA CTC TGG GCA CGT 
Ala Phe Ser Pro Trp Pro Ser Pro Pro Ser Arg Lys Leu Trp Ala Arg 
100 105 110 



454 



CAC CAT CCC CAG GCG CCT GAT GTT TCT TGC CTG ATC CTG GCC ACA GGT 
His His Pro Gin Ala Pro Asp Val Ser Cys Leu lie Leu Ala Thr Gly 
115 120 125 



502 



CTC AAC GAT GGG CAG ATC AAG ATT TGG 
Leu Asn Asp Gly Gin He Lys He Trp 
130 135 

CTT CTG AAT CTT TCT GGC CAC CAA GAC 
Leu Leu Asn Leu Ser Gly His Gin Asp 
150 

ACG CCC AGC GGC AGT TTG ATT TTG GTC 
Thr Pro Ser Gly Ser Leu He Leu Val 
165 170 



GAG GTA CAG ACA GGC CTC CTG 550 
Glu Val Gin Thr Gly Leu Leu 
140 145 

GTC GTG AGA GAT CTG AGC TTC 598 
Val Val Arg Asp Leu Ser Phe 
155 160 

TCT GCA TCC CGG GAT AAG ACA 646 
Ser Ala Ser Arg Asp Lys Thr 
175 



CTT CGA ATT TGG GAC CTG AAT AAA CAC GGT AAG CAG ATC CAG GTG TTA 
Leu Arg He Trp Asp Leu Asn Lys His Gly Lys Gin He Gin Val Leu 
180 185 190 



694 



TCC GGC CAT CTG CAG TGG GTT TAC TGC TGC TCC ATC TCC CCT GAC TGT 742 
Ser Gly His Leu Gin Trp Val Tyr Cys Cys Ser He Ser Pro Asp Cys 
195 200 205 



AGC ATG CTG TGC TCT GCA GCT GGG GAG AAG TCG GTC TTT CTG TGG AGC 790 
Ser Met Leu Cys Ser Ala Ala Gly Glu Lys Ser Val Phe Leu Trp Ser 
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210 215 220 225 

ATG CGG TCC TAG ACA CTA ATC CGG AAA CTA GAA GGC CAC CAA AGO AGT 838 
Met Arg Ser Tyr Thr Leu lie Arg Lys Leu Glu Gly His Gin Ser Ser 
230 235 240 

GTT GTC TCC TGT GAT TTC TCT CCT GAT TCA GCC TTG CTT GTC ACA GCT 886 
Val Val Ser Cys Asp Phe Ser Pro Asp Ser Ala Leu Leu Val Thr Ala 
245 250 255 

TCG TAT GAC ACC AGT GTG ATT ATG TGG GAC CCC TAC ACC GGC GCG AGG 934 
Ser Tyr Asp Thr Ser Val lie Met Trp Asp Pro Tyr Thr Gly Ala Arg 
260 265 270 

CTG AGG TCA CTT CAT CAC ACA CAA CTT GAA CCC ACC ATG GAT GAC AGT 982 
Leu Arg Ser Leu His His Thr Gin Leu Glu Pro Thr Met Asp Asp Ser 
275 280 285 

GAC GTC CAC ATG AGC TCC CTG AGG TCC GTG TGC TTC TCA CCT GAA GGC 1030 
Asp Val His Met Ser Ser Leu Arg Ser Val Cys Phe Ser Pro Glu Gly 
290 295 300 305 

TTG TAT CTC GCT ACG GTG GCA GAT GAC AGG CTG CTC AGG ATC TGG GCT 1078 
Leu Tyr Leu Ala Thr Val Ala Asp Asp Arg Leu Leu Arg He Trp Ala 
310 315 320 

CTG GAA CTG AAG GCT CCG GTT GCC TTT GCT CCG ATG ACC AAT GGT CTT 112 6 

Leu Glu Leu Lys Ala Pro Val Ala Phe Ala Pro Met Thr Asn Gly Leu 
325 330 335 

TGC TGC ACG TTC TTC CCA CAC GGT GGA ATT ATT GCC ACA GGG ACG AGA 1174 
Cys Cys Thr Phe Phe Pro His Gly Gly He He Ala Thr Gly Thr Arg 
340 345 350 

GAT GGC CAT GTC CAG TTC TGG ACA GCT CCC CGG GTC CTG TCC TCA CTG 1222 
Asp Gly His Val Gin Phe Trp Thr Ala Pro Arg Val Leu Ser Ser Leu 
355 360 365 

AAG CAC TTA TGC AGG AAA GCC CTC CGA AGT TTC CTG ACA ACG TAT CAA 1270 
Lys His Leu Cys Arg Lys Ala Leu Arg Ser Phe Leu Thr Thr Tyr Gin 
370 375 380 385 

GTC CTA GCA CTG CCA ATC CCC AAG AAG ATG AAA GAG TTC CTC ACA TAC 1318 
Val Leu Ala Leu Pro He Pro Lys Lys Met Lys Glu Phe Leu Thr Tyr 
390 395 400 
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AGG ACT TTC TAGCAGTGCC GGCTCCCCCA CCTCCTGCAG CAGCAGCAGT 1367 
Arg Thr Phe 

405 

ACAAGGGACT GGCTAGGATG GAGTCAGGCA GCTCACACTG GACCAGTGTG GACCTTCCTT 1427 

CCTCCCATGG CATGTGCAAG TAGGTCTGCG TGACCCCACT TCTGTGGTGC CGGCCTTACC 1487 

TCGTCTTCAT CCGTGGTGAG CAGCCTTCGT CAGTCTAGTT GTGTTGAAGC CAAGTGCAGT 1547 

TGTGGATGTT GCTGGGGTAA TAAAGGCAAG CGGGCTCCAG AGCCTCTCTG GTGGCGGCCA 1607 

AGCCACACTC CCTTAACTGG GAAGTACCTG CCACGTAGGG CATTTCTGCT GCCTATTTCC 1667 

AGCCAGCGGC TGCATGGTTT GAAGTTCCTC CGTTGTGGTC AGAAGAACTC TGGTGTTTGG 1727 

TTCCCTGCTC AGCTGCGCGT GGACTGGGCT GAGCTCCTCA CCATACACTA GTGCCGGCTT 1787 

TTGTTTCCTG TAAACAGTGG TTGCATGTGT AGAGAAGTAA CAAGCGAGTA TTCAGATCAT 1847 

ACGAGGAGGC GTTCCTCGGT GCATGACGGT CAGATGGCCA TTTATCAGCA TATTTATTTG 1907 

TATTTTCTCA GCACATAGTA AGGTACAACT GTGTTTTCTC AATTGTCTCG AAAAAACAGA 1967 

GTTCTTAAGT GGCCCAGTTG TGGAGCCAAG TCTAAGTCGT GTGGAGTCAG TGCTGACATC 2 027 

ACTGGCTTGT GCTGTCTGTC ACATGTGTTT GTCTCTGCTG CTTGACCTCA TGGGATGTAC 2087 

CCTCCAGTTC AACTGCCCAA AACAGACAGC CCCTTCCAAG CACCGTTCTT TGACAGCGGT 2147 

AGCAGCTACC TATTCAAGAC GCCTCACACA AAATCTGCCT TAGAAAGTTA ATATATTTTA 2207 

AATTATTTTA AAAGAAACTC AACATCTTAT TCTTTGGCCT TTCTTAATTG ATGCTTTATG 2267 

GAGGCAGTGT TAACATTGTA CAGTGTATGC ATAGAGGAGT CTCCTCTATT TGAAGAACAA 2327 

TGCAAAATGA GGCTTTCATT GAAGGGAAAA AAAAAAAAAA AA 23 69 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS:- 

(A) LENGTH: 404 amino acids 

(B) TYPE: amino acid 
{ D ) TOPOLOGY : linear 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Met Glu Ala Gly Glu Glu Pro Leu Leu Leu Ala Glu Leu Lys Pro Gly 
15 10 15 

Arg Pro His Gin Phe Asp Trp Lys Ser Ser Cys Glu Thr Trp Ser Val 
20 25 30 

Ala Phe Ser Pro Asp Gly Ser Trp Phe Ala Trp Ser Gin Gly His Cys 
35 40 45 

Val Val Lys Leu Val Pro Trp Pro Leu Glu Glu Gin Phe lie Pro Lys 
50 55 60 

Gly Phe Glu Ala Lys Ser Arg Ser Ser Lys Asn Asp Pro Lys Gly Arg 
65 70 75 80 

Gly Ser Leu Lys Glu Lys Thr Leu Asp Cys Gly Gin lie Val Trp Gly 
85 90 95 

Leu Ala Phe Ser Pro Trp Pro Ser Pro Pro Ser Arg Lys Leu Trp Ala 
100 105 110 

Arg His His Pro Gin Ala Pro Asp Val Ser Cys Leu lie Leu Ala Thr 
115 120 125 

Gly Leu Asn Asp Gly Gin lie Lys lie Trp Glu Val Gin Thr Gly Leu 
130 135 140 

Leu Leu Leu Asn Leu Ser Gly His Gin Asp Val Val Arg Asp Leu Ser 
145 150 155 160 

Phe Thr Pro Ser Gly Ser Leu lie Leu Val Ser Ala Ser Arg Asp Lys 
165 170 175 

Thr Leu Arg lie Trp Asp Leu Asn Lys His Gly Lys Gin lie Gin Val 
180 185 190 

Leu Ser Gly His Leu Gin Trp Val Tyr Cys Cys Ser lie Ser Pro Asp 
195 200 205 

Cys Ser Met Leu Cys Ser Ala Ala Gly Glu Lys Ser Val Phe Leu Trp 
210 215 220 
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Ser Met Arg Ser Tyr Thr Leu lie Arg Lys Leu Glu Gly His Gin Ser 
225 230 235 240 

Ser Val Val Ser Cys Asp Phe Ser Pro Asp Ser Ala Leu Leu Val Thr 
245 250 255 

Ala Ser Tyr Asp Thr Ser Val lie Met Trp Asp Pro Tyr Thr Gly Ala 
260 265 270 

Arg Leu Arg Ser Leu His His Thr Gin Leu Glu Pro Thr Met Asp Asp 
275 280 285 

Ser Asp Val His Met Ser Ser Leu Arg Ser Val Cys Phe Ser Pro Glu 
290 295 300 

Gly Leu Tyr Leu Ala Thr Val Ala Asp Asp Arg Leu Leu Arg lie Trp 
305 310 315 320 

Ala Leu Glu Leu Lys Ala Pro Val Ala Phe Ala Pro Met Thr Asn Gly 
325 330 335 

Leu Cys Cys Thr Phe Phe Pro His Gly Gly lie lie Ala Thr Gly Thr 
340 345 350 

Arg Asp Gly His Val Gin Phe Trp Thr Ala Pro Arg Val Leu Ser Ser 
355 360 365 

Leu Lys His Leu Cys Arg Lys Ala Leu Arg Ser Phe Leu Thr Thr Tyr 
370 375 380 

Gin Val Leu Ala Leu Pro lie Pro Lys Lys Met Lys Glu Phe Leu Thr 
385 390 395 400 

Tyr Arg Thr Phe 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1246 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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di) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

GACACTGCAT CGTCAAACTG ATCCCCTGGC CGTTGGAGGA GCAGTTCATC CCTAAAGGGT 60 

TTGAAGCCAA AAGCCGAAGT AGCAAAAATG AGACGAAAGG GCGGGGCAGC CCAAAAGAGA 12 0 

AGACGCTGGA CTGTGGTCAG ATTGTCTGGG GGCTGGCCTT CAGCCTGTGC TTTCCCCACC 180 

CAGCAGGAAG CTCTGGGCAC GCCACCACCC CCAAGTGCCC GATGTCTCTT GCCTGGTTCT 240 

TGCTACGGGA CTCAACGATG GGCAGATCAA GATCTGGGAG GTGCAGACAG GGCTCCTGCT 300 

TTTGAATCTT TCCGGCCACC AAGATGTCGT GAGAGATCTG AGCTTCACAC CCAGTGGCAG 360 

TTTGATTTTG GTCTCCGCGT CACGGGATAA GACTCTTCGC ATCTGGGACC TGAATAAACA 420 

CGGTAAACAG ATTCAAGTGT TATCGGGCCA CCTGCAGTGG GTTTACTGCT GTTCCATCTC 480 

CCCAGACTGC AGCATGCTGT GCTCTGCAGC TGGAGAGAAG TCGGTCTTTC TATGGAGCAT 540 

GAGGTCCTAC ACGTTAATTC GGAAGCTAGA GGGCCATCAA AGCAGTGTTG TCTCTTGTGA 600 

CTTCTCCCCC GACTCTGCCC TGCTTGTCAC GGCTTCTTAC GATACCAATG TGATTATGTG 660 

GGACCCCTAC ACCGGCGAAA GGCTGAGGTC ACTCCACCAC ACCCAGGTTG ACCCCGCCAT 720 

GGATGACAGT GACGTCCACA TTAGCTCACT GAGATCTGTG TGCTTCTCTC CAGAAGGCTT 780 

GTACCTTGCC ACGGTGGCAG ATGACAGACT CCTCAGGATC TGGGCCCTGG AACTGAAAAC 840 

TCCCATTGCA TTTGCTCCTA TGACCAATGG GCTTTGCTGG CACATTTTTT CCACATGGTG 900 

GAGTCATTGC CACAGGGACA AGAGATGGCC ACGTCCAGTT CTGGACAGCT CCTAGGGTCC 960 

TGTCCTCACT GAAGCACTTA TGCCGGAAAG CCCTTCGAAG TTTCCTAACA ACTTACCAAG 1020 

TCCTAGCACT GCCAATCCCC AAGAAAATGA AAGAGTTCCT CACATACAGG ACTTTTTAAG 1080 

CAACACCACA TCTTGTGCTT CTTTGTAGCA GGGTAAATCG TCCTGTCAAA GGGAGTTGCT 1140 

GGAATAATGG GCCAAACATC TGGTCTTGCA TTGAAATAGC ATTTCTTTGG GATTGTGAAT 12 00 
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AGAATGTAGC AAAACCAGAT TCCAGTGTAC TAGTCATGGA TTTTTC 



1246 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 422 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

ACCATGGTTC CAAGTCCTCT CCCCTGTGGT CAAGTTGCCC GAATGTTGGG CCCAAGTGCC 60 

TTTTCCTCCT TGGGCCTCCC CTTCTGACCT GCAGGACAGT TTTCCGGAGC CCATTTGGTA 120 

TGAGGTATTA ATTAGCCTTA ACTAAATTAC AGGGGACTCA GAGGCCGTGC TCCTGACCGA 180 

TCCAGACACT ATTTTTTTTT TTTTTTTTTA ACAATGGTGT GCATGTGCAG GAAATGACAA 240 

ATTTGTATGT CAGATTATAC AAGGATGTAT TCTTAAACCG CATGACTATT CAGATGGCTA 3 00 

CTGAGTTATC AGTGGCCATT TATTAGCATC ATATTTATTT GTATTTTCTC AACAGATGTT 3 60 

AAGGTACAAC TGTGTTTTTC TCGATTATCT AAAAACCATA GTACTTAAAT TGAAAAAAAA 420 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2019 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 



AA 



422 



(ii) MOLECULE TYPE : DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

GGCACGAGGC GGGGTCAGGG CGGAGGCTGA GGACCAAGTA GGCATGGCGG AGGGCGGGAC 60 

CGGCCCCGAT GGACGGGCCG GCCCGGGACC CGCAGGTCCT AATCTGAAGG AGTGGCTGAG 120 

GGAGCAGTTC TGTGACCATC CACTGGAGCA CTGTGACGAT ACAAGACTCC ATGATGCAGC 180 

CTATGTAGGG GACCTCCAGA CCCTCAGGAA CCTACTGCAA GAGGAGAGCT ACCGGAGCCG 240 

CATCAATGAG AAGTCTGTCT GGTGCTGCGG CTGGCTTCCC TGCACACCAC TGAGGATCGC 300 

AGCCACTGCA GGCCATGGGA ACTGTGTGGA CTTCCTCATA CGCAAAGGGG CCGAGGTGGA 360 

CCTGGTGGAT GTCAAGGGGC AGACTGCCCT GTATGTGGCT GTAGTGAACG GGCACTTGGA 42 0 

GAGCACTGAG ATCCTTTTGG AAGCTGGTGC TGATCCCAAC GGCAGCCGGC ACCACCGCAG 480 

CACTCCTGTG TACCATGCCT YTCGTGTGGG TAGGGACGAC ATCCTGAAGG CTCTTATCAG 540 

GTATGGGGCA GATGTTGATG TCAACCATCA TCTGAATTCT GACACCCGGC CCCCTTTTTC 600 

ACGGCGGCTA ACCTCCTTGG TGGTCTGTCC TCTATACATC AGTGCTGCCT ACCATAACCT 660 

TCAGTGCTTC AGGCTGCTCT TGCAGGCTGG GGCAAATCCT GACTTCAATT GCAATGGCCC 72 0 

TGTCAACACC CAGGAGTTCT ACAGGGGATC CCCTGGGTGT GTCATGGATG CTGTCCTGCG 780 

CCATGGCTGT GAAGCAGCCT TCGTGAGTCT GTTGGTAGAG TTTGGAGCCA ACCTGAACCT 840 

GGTGAAGTGG GAATCCCTGG GCCCAGAGGC AAGAGGCAGA AGAAAGATGG ATCCTGAGGC 900 

CTTGCAGGTC TTTAAAGAGG CCAGAAGTAT TCCCAGGACC TTGCTGAGTT TGTGCCGGGT 960 

GGCTGTGAGA AGAGCTCTTG GCAAATACCG ACTGCATCTG GTTCCCTCGC TGCCGCTGCC 1020 

AGACCCCATA AAGAAGTTTT TGCTTTATGA GTAGCATTCA CATGCAGTGC TGACTGCAAT 1080 

GTGGAAGCCG ATCACCTGCA GTGAAAACTG ACACAGACTC TGGCATCCTG GGAACCATGG 1140 

CCTGTGCTGC CAGCTTGATC CTTGGCTGTC AGTGAAGAAA AAACGGCTGT GTTCTCTTGG 12 00 

ACTGTGATTC TATCTCAGGT GCTTGGGCCA TCGAACGCTC CTTGAGTCAT TGTCAACTGA 1260 
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GAGGCACATA CAAACTTAAT TTTGTTCCTC TTCAGTCTCT CTGTTTTGGA TTCTTCCTGG 132 0 

CAATGTGTGC AGCATGGGCT GAGCCTGGTG ATTGCCCTAG TGGGGAAGGC TTTTTTCTCC 1380 

AGGCTATGCA TCTATTTATG TTCCTACTTT GCAATTTATT GTTCTTTTAA GGCTTGATAT 1440 

CAAAACAGAA AGAGGTTTGT TAAGAAAAGA TATAGGGAGA AAGGAATTCC GGTTCCGTGC 1500 

ACTTGCTAGC CTGCTTTCCT TGCCTGGGTT TGTCTGTCTA TGCTGCCTGG TGCACATCCC 1560 

TTCTCTTTGC TGCCACTGTT CTATTTTGGG AGTTGTCTTC CGTCTAAGAT GGCTTCTGGG 1620 

GTTCTATCTT ATTGCACAGA GGTCCCAGAA CAGTGTTCAT AGGGCACCAT CTGCTCTGCC 1680 

AAGGGTTTTC TGATGTCTTA CCCTGGGGAT CTTCAGACAG TGGTTACCTT TAGGAGACCC 1740 

ACCTGGAACT AACCATTAAG TGACTGCCCA CATTCAGATC AGGGACCATC TTAATAGTAC 1800 

TCACTGCCAG TCCTCACAAG AGAAGATGAC ACGGGTGCTC TCTTCAGACA CTCCCATACA 1860 

GGAAGTTGGA AAATGTCTTG GTCACCTGGG TTGTTCCCAG GCTACAACTT CTTGGTGTTC 1920 

CACTAARACC AGRATATCCT AGTTTTTTGG GTTGACTGTT CCCTCCCCAC TTTCCTTGAA 1980 

NCCCAATGCC CNTTTGTKTN GGTTGCTTCC CTAAAAKTT 2019 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 350 amino acids 

(B) TYPE; amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Ala Arg Gly Gly Val Arg Ala Glu Ala Glu Asp Gin Val Gly Met Ala 
15 10 15 
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Glu Gly Gly Thr Gly Pro Asp Gly Arg Ala Gly Pro Gly Pro Ala Gly 
20 25 30 

Pro Asn Leu Lys Glu Trp Leu Arg Glu Gin Phe Cys Asp His Pro Leu 
35 40 45 

Glu His Cys Asp Asp Thr Arg Leu His Asp Ala Ala Tyr Val Gly Asp 
50 55 60 

Leu Gin Thr Leu Arg Asn Leu Leu Gin Glu Glu Ser Tyr Arg Ser Arg 
65 70 75 80 

lie Asn Glu Lys Ser Val Trp Cys Cys Gly Trp Leu Pro Cys Thr Pro 
85 90 95 

Leu Arg lie Ala Ala Thr Ala Gly His Gly Asn Cys Val Asp Phe Leu 
100 105 110 

lie Arg Lys Gly Ala Glu Val Asp Leu Val Asp Val Lys Gly Gin Thr 
115 120 125 

Ala Leu Tyr Val Ala Val Val Asn Gly His Leu Glu Ser Thr Glu lie 
130 135 140 

Leu Leu Glu Ala Gly Ala Asp Pro Asn Gly Ser Arg His His Arg Ser 
145 150 155 160 

Thr Pro Val Tyr His Ala Xaa Arg Val Gly Arg Asp Asp lie Leu Lys 
165 170 175 

Ala Leu lie Arg Tyr Gly Ala Asp Val Asp Val Asn His His Leu Asn 
180 185 190 

Ser Asp Thr Arg Pro Pro Phe Ser Arg Arg Leu Thr Ser Leu Val Val 
195 200 205 

Cys Pro Leu Tyr lie Ser Ala Ala Tyr His Asn Leu Gin Cys Phe Arg 
210 215 220 

Leu Leu Leu Gin Ala Gly Ala Asn Pro Asp Phe Asn Cys Asn Gly Pro 
225 230 235 240 

Val Asn Thr Gin Glu Phe Tyr Arg Gly Ser Pro Gly Cys Val Met Asp 
245 250 255 
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Ala Val Leu Arg His Gly Cys Glu Ala Ala Phe Val Ser Leu Leu Val 
260 265 270 

Glu Phe Gly Ala Asn Leu Asn Leu Val Lys Trp Glu Ser Leu Gly Pro 
275 280 285 

Glu Ala Arg Gly Arg Arg Lys Met Asp Pro Glu Ala Leu Gin Val Phe 
290 295 300 

Lys Glu Ala Arg Ser lie Pro Arg Thr Leu Leu Ser Leu Cys Arg Val 
305 310 315 320 

Ala Val Arg Arg Ala Leu Gly Lys Tyr Arg Leu His Leu Val Pro Ser 
325 330 335 

Leu Pro Leu Pro Asp Pro lie Lys Lys Phe Leu Leu Tyr Glu 
340 345 350 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 419 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

GCATCCATGG CGGAGGGCGG CAGCACGACG GGCGGGCAGG GCCGGGCTCC GCAGGTCGTA 60 

ATCTGAAGGA GTGGCTGAGG GAGCAATTTT GTGATCATCC GCTGGAGCAC TGTGAGGACA 120 

CGAGGCTCCA TGATGCAGCT TACGTCGGGG ACCTCCAGAC CCTCAGGAGC CTATTGCAAG 180 

AGGAGAGCTA CCGGAGCCGC ATCAACGAGA AGTCTGTCTG GTGCTGTGGC TGGCTCCCCT 240 

GCACACCGTT GCGAATCGCG GCCACTGCAG GCCATGGGAG CTGTGTGGAC TTCCTCATCC 300 

GGAAGGGGGC CGAGGTGGAT CTGGTGGACG TAAAAGGACA GACGGCCCTG TATGTGGCTG 360 
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TGGTGAACGG GCACCTAGAG AGTACCCAGA TCCTTCTCGA AGCTGGCGCG GACCCCAAC 419 
(2) INFORMATION FOR SEQ ID NO: 27: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 595 base pairs 

(B) TYPE: nucleic acid 

(C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GAGGAAGAAG AAAAGTGGAC CCTGAGGCCT TGCAGGTCTT TAAAGAGGCC AGAAGTGTTC 
CCAGAACCTT GCTGTGTCTG TGCCGTGTGG CTGTGAGAAG AGCTCTTGGC AAAACCGGCT 
TCATCTGATT CCTTCGCTGC CTCTGCCAGA CCCCATAAAG AAGTTTCTAC TCCATGAGTA 
GACTCCAAGT GCTGCGGTTG ATTCCAGTGA GGGAGAAAGT GATCTGCAGG GAGGTGGACA 
CCGAGCCCTG AGTGCTGTGC TGCTGCTGGT CTCCTGATGG CTGTTGCTGC AGAAGATGTC 
CTCGTAGACT GTCATTGCTC CTCAGGTGCC TGGGCCGCTG AACAGTCCTT GGGTCATTGT 
CAGCTGAGAG GCTTATACTA AAGTTATTAT TGTTTTTCCC AAGTTCTCTG TTCTGGATTT 
TCAGTTGCAT ATTAATGTAA CGGGCCATGG GGTATGTACA TGTAGGGGCT GAGGTTGGAG 
GCCTACTAAT TTCCTGTAGG GAAGACTCCC AGCACTTCTG GAACTGTGCT TCTCTTTATT 
TTTCTACTTC TCAATTTGAT GGTTCGATTA AAGCCTTCTA GTATCTCAAT GAAAA 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 896 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 4.. 396 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

CTG ATG TCC GCA ATT CTG AAG GTT GGA CAC CAC TGC TGG CTG CCT GTG 48 
Met Ser Ala lie Leu Lys Val Gly His His Cys Trp Leu Pro Val 
15 10 15 

ACA TCC GCT GTC AAT CCC CAA AGG ATG CTG AGG CCA CCA CCA ACC GCT 96 
Thr Ser Ala Val Asn Pro Gin Arg Met Leu Arg Pro Pro Pro Thr Ala 
20 25 30 

GTT TTC AAC TGT GCC GCT TGC TGC TGT CTG TGG GGG CAG ATG CTG ATG 144 
Val Phe Asn Cys Ala Ala Cys Cys Cys Leu Trp Gly Gin Met Leu Met 
35 40 45 

AAT ACA TAC CGT GTA GTT CAG CTT CCT GAG GAG GCC AAG GGC TTG GTG 192 
Asn Thr Tyr Arg Val Val Gin Leu Pro Glu Glu Ala Lys Gly Leu Val 
50 55 60 

CCA CCA GAG ATT CTA CAG AAG TAC CAT GGA TTC TAC TCT TCC CTC TTT 240 
Pro Pro Glu lie Leu Gin Lys Tyr His Gly Phe Tyr Ser Ser Leu Phe 
65 70 75 

GCC TTG GTG AGG CAG CCC AGG TCG CTG CAG CAT CTC TGC CGT TGT GCG 288 
Ala Leu Val Arg Gin Pro Arg Ser Leu Gin His Leu Cys Arg Cys Ala 
80 85 90 95 

CTC CGC ACT CAC CTG GAG GGC TGT CTG CCC CAT GCA CTA CCG CGC CTT 336 
Leu Arg Ser His Leu Glu Gly Cys Leu Pro His Ala Leu Pro Arg Leu 
100 105 110 

CCC CTG CCA CCG CGC ATG CTC CGC TTT CTG CAG CTG GAC TTT GAG GAT 3 84 

Pro Leu Pro Pro Arg Met Leu Arg Phe Leu Gin Leu Asp Phe Glu Asp 
115 120 125 

CTG CTC TAC TAGGCTTGCT GCCCTGTGAA CAAAGCAGAC CCCACCCCCA 433 
Leu Leu Tyr 
130 
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CCCCAAGGGC ATCTCTCAGC AATGAATGAT GCAAGGCGGT CTGTCTTCAA GTCAGGAGTG 493 

GACGCCTTGA TCCACACTTG AGAGAAGAGG CCAGATCAGC ACCYGGCTGG TAGTGATNGC 553 

AGAGGGCACC TGTGCAGATC TGTGTGCGCA CTGGAAATCT CTAGGCTGAA GGCYAGAGCA 613 

AATGGTGCAR GTGTTAGTCC TTGGGANGAG AGACAGANGG TGAGAAAGCA AGACAGAGGT 673 

GAGAGTGCAC ATGTCAAGTG GTAGATTGCC TTAAAAGAAA GCTAAAAAAA GAAAAAGATT 733 

CGGGCGAACT TCTTTAGGGG TAATGCTGCA GCGTGTTAAA CTGACTGACC AGCGTCCATA 793 

TCTTTGGACC CTTCCCGGGT GAAAAAGCCC CTTCATCCTC CAGCGCTCCC CAAGGGTGCT 853 

TAGCAATACC GGGTGCTTTT CTGCCGCAAA GTGAGTTACC AAA 896 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 0 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Met Ser Ala He Leu Lys Val Gly His His Cys Trp Leu Pro Val Thr 
15 10 15 

Ser Ala Val Asn Pro Gin Arg Met Leu Arg Pro Pro Pro Thr Ala Val 
20 25 30 

Phe Asn Cys Ala Ala Cys Cys Cys Leu Trp Gly Gin Met Leu Met Asn 
35 40 45 

Thr Tyr Arg Val Val Gin Leu Pro Glu Glu Ala Lys Gly Leu Val Pro 
50 55 60 

Pro Glu He Leu Gin Lys Tyr His Gly Phe Tyr Ser Ser Leu Phe Ala 
65 ^ 70 75 80 

Leu Val Arg Gin Pro Arg Ser Leu Gin His Leu Cys Arg Cys Ala Leu 
■ 85 90 95 
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Arg Ser His Leu Glu Gly Cys Leu Pro His Ala Leu Pro Arg Leu Pro 
100 105 110 

Leu Pro Pro Arg Met Leu Arg Phe Leu Gin Leu Asp Phe Glu Asp Leu 
115 120 125 

Leu Tyr 
13 0 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

GTGGGGGCGT CATCATGACC TCCTCTAGGG CTCTGCAACA TGACTCCTGT GGTGCAAATC 60 

AACAAATTGT TCACTGATGA ATCCACAAGG ATCTCTGGGC CTACAACCAG GTCCTGGTCC 120 

ACATGACTGT CGTCTTCGGA GAAGGCACCA CTCGCCCCCG GCAGGTACGG CTGACACCTC 180 

CATGGGAGAA GACGTATCCA GGCAGCAGCT GCGCGGCCCT TCAAGAGGGC ACATCCCGTC 240 

ATCTAAAGGC ACGGTGTACT GAAGGTAGTC CTGAGACATG AGTCCGATTA CTACAGGCAC 300 

GTGTTCCTCC AGGTGGAGGC TCAGGTCCCC GGGTGAGCTG GGGCTGCAGC GGGACTCAGG 360 

GCGCGGCTCT GGCTGCAGGT CTCGCAGCTC CCTGGGCTGT AGCTCCCGCA GATCCTTGCG 420 

CACACCGTTG ACTGGT 436 
(2) INFORMATION FOR SEQ ID NO: 31: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 2180 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3l: 

TTAATAGTAC CTACATAGTA GAAAATTATA ACTCCACTTT AAAACAATGT TTTCTTTCTA 60 

TTCAAATCAA TTTAAAACTT TTTATAAACA TTAATGTTGC AAGAGAATCC AGTCCATTTA 120 

TGAAAATTAG TTGACAATCA AGTTCACCCA AGAAAATGTT GACTAAGCTA AAGAAATCAC 180 

AGATAAAACA TTTTACCAAA AGGATAGGTA ACACACAAAA AAATGCTATC ACAGGAAGCT 240 

ATGATCATCT AATATTTCTT TAATAATAAT TCTAGTTCCA TAGGTTTTCA TGTTATGCCA 300 

ATTTGTACCC GAGTTTAATT ACAGAAAAGG CAACAATTTC TAAATTGGTG GTATACATTT 360 

CTTTACAATT TTTTAATGTA AGGCCATTTA TTAAAATAGA CAAACTAGAA GATGAAAACG 420 

AAGGCAACAG AAAAATTCAA CTTTTCACAA CCAAAAGAAT TAGCACAACC TTAGAAATAA 480 

TTTAGAAAAA AGTGTTGTTA AAAGATATGT TGCAGATCTC CGTTCCATTA CCCAAGATTA 540 

TGTCAATTCA CGATTCTAAA TAAATCTTTT TAAAGTAAGA GATTAAAAAC TCATCTTCAG 600 

TGTATATGTA AATTCCGTGG TTTTATCACA CAGGTATGTT TATTCAACAC TGCTTTGGAA 660 

ATGGACCATT TAAAAGGACA TGGCAATTTC CATTCTGTTA AGTTTCATTC AACCTTTACT 720 

TAGGGGTTGA TTACCACATG AAATGTGCTT TTAATGCATA AAAATCACAG TGGATTAGCC 780 

AGCAAAAGGG ACTGGGCGGG GGGGGCATTG AGGAGAATTT GATAATTCAC ATTGTGATTA 840 

TTCTGCACAT TGATGAAACA TAATTCACAC CTCTAAAACC TCAAGACTTC CCTTTTTTAA 900 

AGAACCAAAA TAAACCCAAG ACACCTTGCT GACACTTCCC CACCCCTAAA CAAACTGATG 960 

ACTCTTTTAC ACATAAAACT GAAATAGTTA TGGCAGCAAA AGATTTTGAT GGCAATGAAA 102 0 
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GTTTGTAAAC TGTATTTCAA TCTCTTGTTC TTATTCCCAA AGTGCAAGAT GCAGGGTTCT 1080 

CAATCTTTCA GTAGTGCTTC TCCTGTAAAT AATCCTTCAT TTTGTTTGGC AAAGGCAGTT 1140 

TCTGAATTAA GTCTATTCTG GTATACTGAC GTATAACAAA ACGACACAGG TACTGCAACG 1200 

AGCGCACCTA TGAACCCCGG AACACTGGTT GGCAAGTTCT GACGGAAGTG CAGATTCCAG 1260 

GCAGCGAGAC CTTGAATAAC AAAAAGCTCC CATTTTCAGA GTCCCTGATT GAATGCTCCA 1320 

ATTAGATCAA CTATGGACGT ATGTCCTTCC ACATCGGCTG TTCATAAAAG CTAAACCTAC 1380 

CATTTGAGTG CTCAATTCTA GTGTGAAGTG TTTTACCATG GGAGCGAAAG TCACAGCTTA 1440 

AAAGGTAACG GTCGTCAGAA CTGTCCCGAA CAAGAAAAGA ACCATCTGGC ACGTTTGCTA 1500 

GCTTCCCTTC TGCCTCCCAA CGTGTGATTG GTCCCCAGTA CCATCCTTGC TTTGCAAGTT 1560 

TTTTCAGCTC CTCTGTAAGG CTTGTCACAA CCATGGGACC ACTACTTTGC ACTGAGTCAT 1620 

AAACTCTTGC AACCCCAGGA GCAGAGTTCG GATCAAAATT CAAATGACAG CGCATAACTT 1680 

TCAGCCACGT GGGGCTTTCT GTCCAGTGAG TCCACTGAAA GTTCCCCTTT GGGATTTGGA 1740 

TTATTCCTGC ATTGGAGTAA CCAATGGTGA AGATTGGAGG GACATCCATC GTGAACCCGC 1800 

TCTCCGGGGT TCTGCAACAT GACTCCCGTG GTGCCAATCA ACAAGCCATT CACCGGACTG 1860 

ATCCACGAAG ATCTCTGGGG CGACAACTAG GTCCTGGTCT ACCTGACTCT CATCCTCGGG 1920 

GAAAGCGCGC CCTCCCACTT GAGGAGGAAC CGCAGAGACT TCCATGGGAG AAGAGCTGTC 1980 

CAGACAATAG CTCCGTGATC CTTCCAAAGG ATACATCCCC TCATCTAAAG GCACAGTATA 2040 

CTGAATGTAG TCCTGAGGCA TAAGTCCAAT AACGACAGGC ACATGTTCAT CCAGGTGAAG 2100 

ATGCAGGTCT CCATTATGAG AAGCCGAGCT CTTCAGTGAA TTGGCTTGCT CCTGGCACGT 2160 

GGTCTCAGAC TGGAGGTCGT 2180 

(2) INFORMATION FOR SEQ ID NO: 32: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 2 649 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

GGCACGAGGC TGTGTCCAGC ACACAGAGAG GGCCCGGCCA TCTGCTTTGG TTCAGAGCCC 60 

TGTGTCTGTC TGTCACTTAG ACTCTTCCTC CCGGCTCGCA GCTCACCCTC CATCCTCCTT 120 

ACTGGCTCCA GCATGACTCG CTTCTCTTAT GCAGAGTACT TTGCTCTGTT TCACTCTGGC 180 

TCTGCACCTT CCAGGTCCCC TTCGTCTCCC GAGAACCCAC CGGCCCGCGC ACCCCTGGGT 240 

CTGTTCCAAG GGGTCATGCA GAAGTATAGC AGCAACCTGT TCAAGACCTC CCAGATGGCG 300 

GCTATGGACC CCGTGCTGAA GGCCATCAAG GAAGGGGATG AAGAGGCCTT GAAGATCATG 360 

ATCCAGGATG GGAAGAATCT TGCAGAGCCC AACAAGGAGG GCTGGCTGCC GCTCCACGAG 42 0 

GCTGCCTACT ATGGCCAGCT GGGCTGCCTG AAAGTCCTGC AGCAAGCCTA CCCAGGGACC 480 

ATTGACCAAC GCACACTGCA GGAAGAGACA GC ATTATACC TGGCCACATG CAGAGAACAC 540 

CTGGATTGCC TCCTGTCGCT GCTCCAGGCG GGGGCAGAGC CTGACATCTC TAACAAATCC 600 

AGGGAGACTC CACTTTACAA AGCCTGTGAG CGCAAGAACG CGGAGGCGGT GAGGATATTG 660 

GTGCGATACA ACGCAGACGC CAACCACCGC TGTAACAGGG GCTGGACCGC ACTGCACGAG 72 0 

TCTGTCTCCC GCAATGACCT GGAGGTCATG GAGATCCTAG TGAGTGGCGG GGCCAAGGTG 780 

GAGGCCAAGA ATGTCTACAG CATCACCCCT TTGTTTGTGG CTGCCCAGAG TGGGCAGCTG 840 

GAGGCCCTGA GGTTCCTGGC CAAGCATGGT GCAGACATCA ACACGCAGGC CAGTGACAGT 900 

GCATCAGCCC TCTACGAGGC CAGCAAGAAT GAGCATGAAG ACGTGGTAGA GTTTCTTCTC 960 

TCTCAGGGCG CCGATGCTAA CAAAGCCAAC AAGGACGGCC TGCTCCCCCT GCATGTTGCC 1020 
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TCCAAGAAGG GCAACTATAG AATAGTGCAG ATGCTGCTGC CTGTGACCAG CCGCACGCGC 1080 

GTGCGCCGTA GCGGCATCAG CCCGCTGCAC CTAGCGGCCG AGCGCAACCA CGACGCGGTG 1140 

CTGGAGGCGC TGCTGGCCGC GCGCTTCGAC GTGAACGCAC CTCTGGCTCC CGAGCGCGCC 1200 

CGCCTCTACG AGGACCGCCG CAGTTCTGCG CTCTACTTCG CTGTGGTCAA CAACAATGTG 1260 

TACGCCACCG AGCTGTTGCT GCTGGCGGGC GCGGACCCCA ACCGCGATGT CATCAGCCCT 1320 

CTGCTCGTGG CCATCCGCCA CGGCTGCCTG CGCACCATGC AGCTGCTGTT GGACCATGGC 13 80 

GCCAACATCG ACGCCTACAT CGCCACTCAC CCCACCGCCT TTCCAGCCAC CATCATGTTT 1440 

GCCATGAAGT GCCTGTCGTT ACTCAAGTTC CTTATGGACC TCGGCTGCGA TGGCGAGCCC 1500 

TGCTTCTCCT GCCTGTACGG CAACGGGCCG CACCACCCGC CCCGCGACCT GGCCGCTTCC 1560 

ACGACGCACC CGTGGACGAC AAGGCACCTA GCGTGGTGCA GTTCTGTGAG TTCCTGTCGG 1620 

CCCCGGAAGT GAGCCGCTGG GCGGGACCCA TCATCGATGT CCTCCTGGAC TATGTGGGCA 1680 

ACGTGCAGCT GTGCTCCCGG CTGAAGGAGC ACATCGACAG CTTTGAGGAC TGGGCTGTCA 1740 

TCAAGGAGAA GGCAGAACCT CCGAGACCTC TGGCTCACCT CTGCCGGCTG CGGGTTCGGA 1800 

AGGCCATAGG AAAATACCGG ATAAAACTCC TGGACACACT GCCGCTTCCC GGCAGGCTAA 1860 

TCAGATACTT GAAATATGAG AATACACAGT AACCAGCCTG GAGAGGAGAT GTGGCCTTCA 1920 

GACTGTTTCC GGGACGCCCC AGGTGGCCTG CATCCAGGAC CCCCTGGGGT CAGAACAGGT 1980 

GTGACCTTGC TGGTTCTTTG CTGGAGCTTC ACCCAAAGTG AGAACCTGAT GTGGGGAGTG 2040 

GACGTGGAAC CTCTGCTTTC ACACTGTCAG CGGATCGCAG ACCCGCTCTG CTTCTGGCCA 2100 

TAGCCAGAGA CCTTCAACCT GGGGCCAGGG GAGAGCTGGT CTGGGCAAGG TGGCCCAGGC 2160 

AGGAATCCTG GCCTTAAGCT GGAGAACTTG TAGGAATCCC TCACTGGACC CTCAGCTTTC 2220 

AGGCTGCGAG GGAGACGCCC AGCCCAAGTA TTTTATTTCC GTGACACAAT AACGTTGTAT 2280 

CAGAAAAAAA AAAAAACATG GGCGCAGCTT ATTCCTTAGT AGGGTATTTA CTTGCATGCG 2340 

CGCTTAAAGC TACTGGAAAC ATGCGTTCCA CTATGCTTGA GAATCCCCTT GCACTGGTAA 2400 
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ACGAGAGCCG ACGTGCTTCA AGGTTGGATT TTTGGTTGCC CCTTTGGCGT TCCGCGGGTT 2460 

TGTCCGACGT AATTGACCCC GTGTTTTGTC ACTTTCGAGT GTTCCGACTA TTGGGGGGCT 2520 

TTTGGTTGTC CCCAAAATTG TGGGTGGTGT GCGGACGCCA CGAGAAGTGG TTCATGGGCG 2580 

ATAATCATTA CTGGAGAATG TAGAGCGGCG GTTTTACGAA TAAATATTTT TTAAGCCGCC 2640 

TTCCCAAAA 2649 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 495 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

CCTCCTGAGA GTTCGCCGGC CCGGGCCCAA TGGGTTGTTC CAAGGGGTCA TGCAGAAATA 60 

CAGCAGCAGC TTGTTCAAGA CCTCCCAGCT GGCGCCTGCG GACCCCTTGA TAAAGGCCAT 120 

CAAGGATGCG ATGAAGAGGC CTTGAAGACC ATGATCAAGG AAGGGAAGAA TCTCGCAGAG 180 

CCCAACAAGG AGGGCTGGCT GCCGCTGCAC GAGGCCGCAT ACTATGGCCA GGTGGGCTGC 240 

CTGAAAGTCC TGCAGCGAGC GTACCCAGGG ACCATCGACC AGCGCACCCT GCAGGAGGAA 3 00 

ACAGCCGTTT ACTTGGCAAC GTGCAGGGGC CACCTGGACT GTCTCCTGTC ACTGCTCCAA 3 60 

GCAGGGGCAG AGCGGGACAT CTCCAACAAA TCCCGAGAGA ACCGCTCTAC AAAGCCTGTG 420 

AGCGCAAGAA CGCGGAAGCC GTGAAGATTC TTGGTGCAGC ACAACGCAGA CACCAACAAC 480 

GCTGCAACCG GGCTG 495 
(2) INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 709 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

GTGCAGCTCT GCTCGCGGCT GAAGGAACAC ATCGACAGCT TTGAGGACTG GGCCGTCATC 60 

AAGGAGAAGG CAGAACCTCC AAGACCTCTG GCTCACCTTT GCCGACTGCG GGTTCGAAAG 120 

GCCATTGGGA AATACCGTAT AAAACTCCTA GACACCTTGC CGCTCCCAGG CAGGCTGATT 180 

AGATACCTGA AATACGAGAA CACCCAGTAA CTGGGGCCAC GGGGAGAGAG GAGTAGCCCC 240 

TCAGACTCTT CTTACTAAGT CTCAGGACGT CGGTGTTCCC AACTCCAAGG GGACCTGGTG 300 

ACAGACGAGG CTGCAGGCTG CCTCCCTCTC AGCCTGGACA GCTACCAGGA TCTCACTGGG 360 

TCTCAGGGCC CAGAGCTTTG GCCAGAGCAG AGAACAGAAT GTGTCAAGGA GAAGAATCAT 420 

TTGTTTACAA ACTGATGAGC AGATCCCAGA CCTTCTCTAC CTTCAGGAAT GGCAGAAACC 480 

TCTATTCCTG GGGCCAGGGC AGAGCTTGAG GTGTTCTGGG GAAGGTGGTG CTCAGAGCCT 540 

TCCCTGTGCC CCTCCACTTG TTCTGGAAAA CTCACCACTT GACTTCAGAG CTTTCTCTCC 600 

AAAGACTAAG ATGAAGACGT GGCCCAAGGT AGGGGGTAGG GGGAGCCTGG GTCTTGGAGG 660 

GCTTTGTTAA GTATTAATAT AATAAATGTT ACACATGTGA AAAAAAAAA 709 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 848 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1..624 



(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

TTG GAG AAG TGT GGT TGG TAT TGG GGG CCA ATG AAT TGG GAA GAT GCA 48 
Leu Glu Lys Cys Gly Trp Tyr Trp Gly Pro Met Asn Trp Glu Asp Ala 
15 10 15 

GAG ATG AAG CTG AAA GGG AAA CCA GAT GGT TCT TTC CTG GTA CGA GAC 96 
Glu Met Lys Leu Lys Gly Lys Pro Asp Gly Ser Phe Leu Val Arg Asp 
20 25 30 

AGT TCT GAT CCT CGT TAG ATC CTG AGC CTC AGT TTC CGA TCA CAG GGT 144 
Ser Ser Asp Pro Arg Tyr He Leu Ser Leu Ser Phe Arg Ser Gin Gly 
35 40 45 

ATC ACC CAC CAC ACT AGA ATG GAG CAC TAC AGA GGA ACC TTC AGC CTG 192 
He Thr His His Thr Arg Met Glu His Tyr Arg Gly Thr Phe Ser Leu 
50 55 60 

TGG TGT CAT CCC AAG TTT GAG GAC CGC TGT CAA TCT GTT GTA GAG TTT 240 
Trp Cys His Pro Lys Phe Glu Asp Arg Cys Gin Ser Val Val Glu Phe 
65 70 75 80 

ATT AAG AGA GCC ATT ATG CAC TCC AAG AAT GGA AAG TTT CTC TAT TTC 288 
He Lys Arg Ala He Met His Ser Lys Asn Gly Lys Phe Leu Tyr Phe 
85 90 95 

TTA AGA TCC AGG GTT CCA GGA CTG CCA CCA ACT CCT GTC CAG CTG CTC 336 
Leu Arg Ser Arg Val Pro Gly Leu Pro Pro Thr Pro Val Gin Leu Leu 
100 105 110 

TAT CCA GTG TCC CGA TTC AGC AAT GTC AAA TCC CTC CAG CAC CTT TGC 384 
Tyr Pro Val Ser Arg Phe Ser Asn Val Lys Ser Leu Gin His Leu Cys 
115 120 125 

AGA TTC CGG ATA CGA CAG CTC GTC AGG ATA GAT CAC ATC CCA GAT CTC 432 
Arg Phe Arg He Arg Gin Leu Val Arg He Asp His He Pro Asp Leu 
130 135 140 



SUBSTITUTE SHEET (RULE 26) 



wo 98/20023 



PCT/AU97/00729 



158- 



CCA CTG CCT AAA CCT CTG ATC TCT TAT ATC CGA AAG TTC TAG TAG TAT 480 
Pro Leu Pro Lys Pro Leu lie Ser Tyr lie Arg Lys Phe Tyr Tyr Tyr 
145 150 155 160 

GAT CCT CAG GAA GAG GTA TAG CTG TCT CTA AAG GAA GCG CAG CGT CAG 528 
Asp Pro Gin Glu Glu Val Tyr Leu Ser Leu Lys Glu Ala Gin Arg Gin 
165 170 175 

TTT CCA AAC AGA AGC AAG AGG TGG AAC CCT CCA CGT AGC GAG GGG CTC 576 
Phe Pro Asn Arg Ser Lys Arg Trp Asn Pro Pro Arg Ser Glu Gly Leu 
180 185 190 

CCT GCT GGT CAC CAC CAA GGG CAT TTG GTT GCC AAG CTC CAG CTT TGAAGAACCA 
631 

Pro Ala Gly His His Gin Gly His Leu Val Ala Lys Leu Gin Leu 
195 200 205 

AATTAAGCTA CCATGAAAAG AAGAGGAAAA GTGAGGGAAC AGGAAGGTTG GGATTCTCTG 691 

TGCAGAGACT TTGGTTCCCC ACGCAAGCCC TGGGGCTTGG AAGAAGCACA TGACCGTACT 751 

CTGCGTGGGG CTCCACCTCA CACCCACCCC TGGGCATCTT AGGACTGGAG GGGCTCCTTG 811 

GAAAACTGGA AGAAGTCTCA ACACTGTTTC TTTTTCA 848 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 207 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

Leu Glu Lys Cys Gly Trp Tyr Trp Gly Pro Met Asn Trp Glu Asp Ala 
15 10 15 

Glu Met Lys Leu Lys Gly Lys Pro Asp Gly Ser Phe Leu Val Arg Asp 
20 25 30 

Ser Ser Asp Pro Arg Tyr He Leu Ser Leu Ser Phe Arg Ser Gin Gly 
35 40 45 
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Ile Thr His His Thr Arg Met Glu His Tyr Arg Gly Thr Phe Ser Leu 
50 55 60 

Trp Cys His Pro Lys Phe Glu Asp Arg Cys Gin Ser Val Val Glu Phe 
65 70 75 80 

lie Lys Arg Ala lie Met His Ser Lys Asn Gly Lys Phe Leu Tyr Phe 
85 90 95 

Leu Arg Ser Arg Val Pro Gly Leu Pro Pro Thr Pro Val Gin Leu Leu 
100 105 110 

Tyr Pro Val Ser Arg Phe Ser Asn Val Lys Ser Leu Gin His Leu Cys 
115 120 125 

Arg Phe Arg lie Arg Gin Leu Val Arg He Asp His He Pro Asp Leu 
130 135 140 

Pro Leu Pro Lys Pro Leu He Ser Tyr He Arg Lys Phe Tyr Tyr Tyr 
145 150 155 160 

Asp Pro Gin Glu Glu Val Tyr Leu Ser Leu Lys Glu Ala Gin Arg Gin 
165 170 175 

Phe Pro Asn Arg Ser Lys Arg Trp Asn Pro Pro Arg Ser Glu Gly Leu 
180 185 190 

Pro Ala Gly His His Gin Gly His Leu Val Ala Lys Leu Gin Leu 
195 200 205 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 464 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
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GTTCCAAGCC TAACCCATCT TTGTCGTTTG GAAATTCGGG CCAGTCTAAA AGCAGAGCAC 



60 



CTTCACTCTG ACATTTTCAT CCATCAGTTG CCACTTCCCA GAAGTCTGCA GAACTATTTG 



120 



CTCTATGAAG AGGTTTTAAG AATGAATGAG ATTCTAGAAC CAGCAGCTAA TCAGGATGGA 



180 



GAAACCAGCA AGGCCACCTG ACACAGGTCC TTTAATTCTG TTTAGTCACA AAAGACGGCT 



240 



TGTGTGACTG TTTGGATTTG GTGATCAAAT GTCCATGTTT ACAGTTGCTT TTCCCAGTTT 



300 



GTGTCTTTCC CAATATTGTG AACCTTATCC ATCTTGCCTT ACTCAGTTTT ATTTCTAGTG 



360 



CACTTTGTTG TGTATTATTT GTTTACCTGA CCATTTTCTA CTTTATTCTG CTAATAAACT 



420 



GTAATTCTGA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAA 



464 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 747 base pairs 

(B) TYPE: nucleic acid 

. (C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 

GGGGATCGAA AGCGGGGGCT TCTGGGACGC AGCTCTGGAG ACGCGGCCTC GGACCAGCCA 60 

TTTCGGTGTA GAAGTGGCAG CACGGCAGAC TGGTCAAACA AATGGATTTT ACAGAGGCTT 12 0 

ACGCGGACAC GTGCTCTACA GTTGGACTTG CTGCCAGGGA AGGCAATGTT AAAGTCTTAA 180 

GGAAACTGCT CAAAAAGGGC CGAAGTGTCG ATGTTGCTGA TAACAGGGGA TGGATGCCAA 240 

TTCATGAAGC AGCTTATCAC AACTCTGTAG AATGTTTGCA AATGTTAATT AATGCAGATT 300 

CATCTGAAAA CTACATTAAG ATGAAGACCT TTGAAGGTTT CTGTGCTTTG CATCTCGCTG 360 

CAAGTCAAGG ACATTGGAAA ATCGTACAGA TTCTTTTAGA AGCTGGGGCA GATCCTAATG 420 

CAACTACTTT AGAAGAAACG ACACCATTGT TTTTAGCTGT TGAAAATGGA CAGATAGATG 480 
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TGTTAAGGCT GTTGCTTCAA CACGGAGCAA ATGTTAATGG ATCCCATTCT ATGTGTGGAT 



540 



GGAACTCCTT GCACCAGGCT TCTTTTCAGG AAAATGCTGA GATCATAAAA TTGCTTCTTA 



600 



GAAAAGGAGC AAACAAGGAA TGCCAGGATG ACTTTGGAAT CACACCTTTA TTTGTGGCTG 



660 



CTCAGTATGG CCAAGCTAGA AAGCTTTGAA GCATACTTAT TTCATCCGGG TGCAAATGTC 



720 



AATTGTCAAG CCTTGGACAA AGCTACC 



747 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1018 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

CACAAATGGG ACCATACAAA AATCTTGGAC TTGTTAATAA CCACTTACTA ACCGGGACCT 60 

GTGACACTGG GCTAAACAAA GTAAGTCCCT GTTTACTCAG CAGTGTTTGG GGGACATGAA 120 

GGATTGCCTA GAAATATTAC TCCGGAATGG TCTACAGCCC AGACGCCCAG GCGTGCCTTG 180 

TTTTTGGATT CAGTTCTCCT GTGTGCATGG CTTTCCAAAA GGAGGTGGAG CTGTAGTTCT 240 

TTGGAATTGT GAACATTCTT TTGAAATATG GAGCCCAGAT AAATGAACTT CATTTGGCAT 300 

ACTGCCTGAA GTACGAGAAG TTTTCGATAT TTCGCTACTT TTTGAGGAAA GGTTGCTCAT 360 

TGGGACCATG GAACCATATA TATGAATTTG TAAATCATGC AATTAAAGCA CAAGCAAAAT 420 

ATAAGGAGTG GTTGCCACAT CTTCTGGTTG CTGGATTTGA CCCACTGATT CTACTGTGCA 480 

ATTCTTGGAT TGACTCAGTC AGCATTGACA CCCTTATCTT CACTTTGGAG TTTACTAATT 540 

GGAAGACACT TGCACCAGCT GTTGAAAGGA TGCTCTCTGC TCGTGCCTCA AACGCTTGGA 600 

TTCTACAGCA ACATATTGCC CACTGTTCCA TCCCTGACCC ATCTTTGTCG TTTGGAAATT 660 
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CGGTCCAGTC TAAAATCAGA ACGTCTACGG TCTGACAGTT ATATTAGTCA GCTGCCACTT 



720 



CCCAGAAGCC TACATAATTA TTTGCTCTAT GAAGACGTTC TGAGGATGTA TGAAGTTCCA 



780 



GAACTGGCAG CTATTCAAGA TGGATAAATC AGTGAAACTA CTTAACACAG CTAATTTTTT 



840 



TCTCTGAAAA ATCATCGAGA CAAAAGAGCC ACAGAGTACA AGTTTTTATG ATTTTATAGT 



900 



CAAAAGATGA TTATTGATTG TCAGATAGGT TAGGTTTTGG GGGGCCAGTA GTTCAGTGAG 



960 



AATGTTTATG TTTACAACTA GCCTTCCCAG TAAAAAAAAA AAAAAAAAAA AAAAAAAA 



1018 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1897 base pairs 
{B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

CGGGGGGCTG GGACCTGGGG CGTAACCGTC TCTACCACGA CGGCAAGAAC CAGCCAAGTA 60 

AAACATACCC AGCCTTTCTG GAGCCGGACG AGACATTCAT TGTCCCTGAC TCCTTTTTCG 120 

TGGCCCTGGA CATGRATGAT GGGACCTTAA GTTTCATCGT GGATGGACAG TACATGGGAG 180 

TGGCTTTCCG GGGACTCAAG GGTAAAAAGC TGTATCCTGT AGTGAGTGCC GTCTGGGGCC 240 

ACTGTGAGAT CCGCATGCGC TACTTGAACG GACTTGATCC TGAGCCCCTG CCACTCATGG 300 

ACCTGTGCCG GCGTTCGGTG CGCCTAGCGC TGGGAAAAGA GCGCCTGGGT GCCATCCCCG 360 

CTCTGCCGCT ACCTGCCTCC CTCAAAGCCT ACCTCCTCTA CCAGTGATCC ACATCCCAGG 420 

ACCGCCATAC GACAGCCATC TGGTGCCAAR TCACTGAGCC CGTTGGGGTC CGCCGACCCC 480 

TGCGCCTGGG ATGGAAGCCC ACCTCAGCCA TGGGCAGACG TGCCCCCTCA TCCTACCGGC 540 
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TGCCTCTGCT GGGGGAACCT ATGCCAACGG ACTTCTCCCT TCCCAACACT GGCTGAAGCA 



600 



GCAGCACCCA GGCCCTTCCC TGAACCAGAT GCAGAGAATA AACTATGAAA ACCTCTCTCA 660 
GGCGCCTTCT GCTCTCAGGT GGAGTGGGCT GCCCCCCACT CTCTGCAGAG AGAGGCTACA 720 



CCCACCTGGG GGGTCCTGGG AGGTAAGACT AGTAGGAGGT GCCAGGGCTG ARTCCAAAAG 



780 



CAGGAATGGC CAGGAMCAGG CCATACAGAT GAAGCTCAGG ATGTCACATA CCATGGACAM 



840 



TGAGACAGAA CCCCAGGTTG GAMTTCCCTT GGGCCAACGA GTGCCAGCTT TAATGTCAGC 



900 



TGCMGGTGCT CTGTGGCCTG TATTTATTCT TTAAACAGTA GCAAAGGCCA TTTATTTATT 



960 



CCACTTAGAA AGGAAACCTT GGTGGGTGGY TTCCCTCGAT GTGCTTTCCC CCACCTCCCT 1020 

GGAATGTGTG TGCCACACCT GTCCTTGTCC CAGGCCAGGA CTGTGGCACA TGAGCTGGTG 1080 

TGCACAGATA CACGTATGTC GTCGTGCATG ACCCCTGACT AGTTCCTAAG TAGCCCTGCA 1140 

CCAAGCACCA GAGCAGACCC CAAGAGAGGC CCGTGCAAGT CCCCATGTCC CCAGGTCCCT 1200 

GCTTCTGTTG CCTTGGGACT CATACACCGG CACACGTGTT TCAGCCTCTT GACTTCCATG 1260 

AGCTTCGAAT TTTGCCCCCG ATTCTTCTGA TATTTCCCAT TGGCATCCTC CAAAGCTCTG 1320 

GGCCTGGAGG GCATTAGGAC ACATGGAATG AGTGGGGTCT CCAGCCCCTG GGAAAGCCAC 1380 

TGGCAAGGCA GGATTAGAAA GACCAAGAGC AGGGTGGGGC GCCATGAAGC CTGTATGCCT 1440 

CTCAGGCTCA AGACCCCGCC ACACACCCAC TCAAGCCTCA GAAGTGGTGT GTAGGGCAGC 1500 

CCCAGGAGAG GAATGCCTGT CCTAGCAGCA CGTACATGGA GCACCCCACA TGTGCTCCAG 1560 

CCCTCTGGCT GTTTCTCTTG CTCTAGAATC AACTCCCTAC ATTGGGAATG TAGCCATTTG 1620 

GTAGAGGACT TGCCTAGCCT GCAGGAAGCT CACGTTCCAT CCCCTGCACC AAGGAGAATC 1680 

AAAGCTCAGG AGGCTGAGGC AGGAGGATTG CTGTCAGTGG TGTACAGAGG TCATGGCCAT 1740 

CCTGGGCTAT ATTAAACCTT GTCCTTTAAG AAAAAGAAAA GAAATCAACT TCCATTGAAT 1800 

CTGAGTTCTG CTCATTTCTG CACAGGTACA ATAGATGACT TKATTTGTTG AAAAATGKTT I860 



AATATATTTA CMTATATATA TATTTGTAAG AAGCATT 
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(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 134 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

Gly Gly Trp Asp Leu Gly Arg Asn Arg Leu Tyr His Asp Gly Lys Asn 
15 10 15 ^ 

Gin Pro Ser Lys Thr Tyr Pro Ala Phe Leu Glu Pro Asp Glu Thr Phe 
20 25 30 

lie Val Pro Asp Ser Phe Phe Val Ala Leu Asp Met Xaa Asp Gly Thr 
35 40 45 

Leu Ser Phe lie Val Asp Gly Gin Tyr Met Gly Val Ala Phe Arg Gly 
50 55 60 

Leu Lys Gly Lys Lys Leu Tyr Pro Val Val Ser Ala Val Trp Gly His 
65 70 75 80 

Cys Glu lie Arg Met Arg Tyr Leu Asn Gly Leu Asp Pro Glu Pro Leu 
85 90 95 

Pro Leu Met Asp Leu Cys Arg Arg Ser Val Arg Leu Ala Leu Gly Lys 
100 105 110 

Glu Arg Leu Gly Ala lie Pro Ala Leu Pro Leu Pro Ala Ser Leu Lys 
115 120 125 

Ala Tyr Leu Leu Tyr Gin 
130 



(2) INFORMATION' FOR SEQ ID NO: 42: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 265 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

AAGGGTAAAA AACTGTATCC TGTAGTGAGT GCCGTCTGGG GCCACTGTAG ATCCGAATGC 60 

GCTACTTGAA CGGACTCGAT CCCGAGACTG CCGCTCATGG ATTTGTGCCG TCGCTCGGTG 120 

CGCCTGGCCC TGGGGAGGGA GCGCCTGGGG GAGAACCACA CCTGCCGCTG CCGGCTTCCC 180 

TGAAGGCCTA CCTCCTCTAC CAGTGACGTT CGCCATCATA CCGCCAGCGC GACAGCCACC 240 

TGGTGCCAAC TCACTGAGCC GCCTG 265 

(2) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2438 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

AAGTGGCGGC GGTCCCTGGA GAGCAGGCGG AGGCAGCGGC AAGTCTGACT CTGGGCTGAC 60 

CGTGGAGCCG GGGCGGGGGC TGACAGCCAG GCCTCCGCCT GGCGGGAGCC GCACGAGGAG 120 

CGGGAGTGGC CGGGCCTCTC TTCCGCGCTT GAGCGAGCGC CGGGTGATGG CGGTGGTGAT 180 

GGCGGCAGGC GCTCGGACAG CfCCGCTTGA GCTGAGCTCG GAGAGATCCG TCCAGAAAGT 240 
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GCCCAGAAGA AACTTCCTCT TAGAAAAGCT GAAAAACACA RTATTTATAA CACTGGAAAT 3 00 

TGTAAAGAAT TTGTTTAAAA TGGCTGAAAA CAATAGTAAA AATGTAGATG TACGGCCTAA 3 60 
AACAAGTCGG AGTCGAAGTG CTGACAGGAA GGATGGTTAT GTGTGGAGTG GAAAGAAGTT . 420 

GTCTTGGTCC AAAAAGAGTG AGAGTTGTTC TGAATCTGAA GCCATAGGTA CTGTTGAGAA 480 

TGTTGAAATT CCTCTAAGAA GCCAAGAAAG GCAGCTTAGC TGTTCGTCCA TTGAGTTGGA 540 

CTTAGATCAT TCCTGTGGGC ATAGATTTTT AGGCCGATCC CTTAAACAGA AACTGCAAGA 600 

TGCGGTGGGG CAGTGTTTTC CAATAAAGAA TTGTAGTGGC CGACACTCTC CAGGGCTTCC 660 

ATCTAAAAGA AAGATTCATA TCAGTGAACT CATGTTAGAT AAGTGCCCTT TCCCACCTCG 720 

CTCAGATTTA GCCTTTAGGT GGCATTTTAT TAAACGACAC ACTGTTCCTA TGAGTCCCAA 780 

CTCAGATGAA TGGGTGAGTG CAGACCTGTC TGAGAGGAAA CTGAGAGATG CTCAGCTGAA 840 

ACGAAGAAAC ACAGAAGATG ACATACCCTG TTTCTCACAT ACCAATGGCC AGCCTTGTGT 900 

CATAACTGCC AACAGTGCTT CGTGTACAGG TGGTCACATA ACTGGTTCTA TGATGAACTT 960 

GGTCACAAAC AACAGCATAG AAGACAGTGA CATGGATTCA GAGGATGAAA TTATAACGCT 1020 

GTGCACAAGC TCCAGAAAAA GGAATAAGCC CAGGTGGGAA ATGGAAGAGG AGATCCTGCA 1080 

GTTGGAGGCA CCTCCTAAGT TCCACACCCA GATCGACTAC GTCCACTGCC TTGTTCCAGA 1140 

CCTCCTTCAG ATCAGTAACA ATCCGTGCTA CTGGGGTGTC ATGGACAAAT ATGCAGCCGA 1200 

AGCTCTGCTG GAAGGAAAGC CAGAGGGCAC CTTTTTACTT CGAGATTCAG CGCAGGAAGA 12 60 

TTATTTATTC TCTGTTAGTT TTAGACGCTA CAGTCGTTCT CTTCATGCTA GAATTGAGCA 132 0 

GTGGAATCAT AACTTTAGCT TTGATGCCCA TGATCCTTGT GTCTTCCATT CTCCTGATAT 1380 

TACTGGGCTC CTGGAACACT ATAAGGACCC CAGTGCCTGT ATGTTCTTTG AGCCGCTCTT 1440 

GTCCACTGCC TTAATCCGGA CGTTCCCCTT TTCCTTGCAG CATATTTGCA GAACGGTTAT 1500 

TTGTAATTGT ACGACTTACG ATGGCATCGA TGCCCTTCCC ATTCCTTCGC CTATGAAATT 1560 

GTATCTGAAG GAATACCATT ATAAATCAAA AGTTAGGTTA CTCAGGATTG ATGTGCCAGA 162 0 
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GCAGCAGTGA TGCGGAGAGG TTAGAATGTC GACCTGCATA CATATTTTCA TTTAATATTT 1680 

TATTTTTCTT ATGCCTCTTT GAATTTTTGT ACAAAGGCAG TTGAATCAAA TAAAACTGTG 1740 

CCCTAAGTTT TAATTCCAGA TCAATTTATT TTTTTTATGA TACACTTGTT ATATATTTTT 1800 

AAGCAGGTGT TTGGTTTTGT TTTTACCATA TAAATTTACA TATGGTCCAG GCATATTTAC 1860 

AATTTCAAGG CATTGCATAT ACATTTGAAT ATTCTGTATT TTTTAAATAA TCTTTTGTTC 192 0 

TTTCCTATGT GTGAAATATT TTGCTAATCT ATGCTATCAG TATTCTTGTA TGACCGAATA 1980 

GTTACCTATT CTCTTTTCAT CTTGAAGATT TTCAGTAAAG AGTGTTGTAA TCAATCCATT 2040 

ATAATGTAAT TGACTTTTGT AATTTGCCAA TAGGAGTGTT AAACAACAAA ATGATTTAAA 2100 

ATGAAACTTA ATGTATTTTC ATTTTAAATA TTAACTAAAC CAAGTTTGTT TGTTAGTTAT 2160 

TCTAGCCAAT AAGTU^AAGAG AATGTAGCAT CCTAGAGGTG TATTTGTTCT GCAGTTTGGC 2220 

AGGACCGTCA GTTAGTCCAA ATAAACATCC CCTCAGCGTG GAGGCGAATG GAACCTGTGC 22 80 

TCCTTTCTTA CGGGAAGCTT TGCAAAGCAA AATAGCAGGG TTACAAGCTT GGAGTTGTTA 2340 

AGGCAACTAG AGTTTTCTCT ATTAATTTAT AGACTGTTGT TGCACCTACT TAGCTCTTTT 2400 

TTGGGAACTC TAGTTCCCAG GGGAAAATAC CTCGTGCC 2438 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 542 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

Ser Gly Gly Gly Pro Trp Arg Ala Gly Gly Gly Ser Gly Lys Ser Asp 
15 10 15 
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Ser Gly Leu Thr Val Glu Pro Gly Arg Gly Leu Thr Ala Arg Pro Pro 
20 25 30 

Pro Gly Gly Ser Arg Thr Arg Ser Gly Ser Gly Arg Ala Ser Leu Pro 
35 40 45 

Arg Leu Ser Glu Arg Arg Val Met Ala Val Val Met Ala Ala Gly Ala 
50 55 60 

Arg Thr Ala Pro Leu Glu Leu Ser Ser Glu Arg Ser Val Gin Lys Val 
65 70 75 80 

Pro Arg Arg Asn Phe Leu Leu Glu Lys Leu Lys Asn Thr Xaa Phe lie 
85 90 95 

Thr Leu Glu lie Val Lys Asn Leu Phe Lys Met Ala Glu Asn Asn Ser 
100 105 110 

Lys Asn Val Asp Val Arg Pro Lys Thr Ser Arg Ser Arg Ser Ala Asp 
115 120 125 

Arg Lys Asp Gly Tyr Val Trp Ser Gly Lys Lys Leu Ser Trp Ser Lys 
130 135 140 

Lys Ser Glu Ser Cys Ser Glu Ser Glu Ala lie Gly Thr Val Glu Asn 
145 150 155 160 

Val Glu lie Pro Leu Arg Ser Gin Glu Arg Gin Leu Ser Cys Ser Ser 
165 170 175 

lie Glu Leu Asp Leu Asp His Ser Cys Gly His Arg Phe Leu Gly Arg 
180 185 190 

Ser Leu Lys Gin Lys Leu Gin Asp Ala Val Gly Gin Cys Phe Pro lie 
195 200 205 

Lys Asn Cys Ser Gly Arg His Ser Pro Gly Leu Pro Ser Lys Arg Lys 
210 215 220 

lie His He Ser Glu Leu Met Leu Asp Lys Cys Pro Phe Pro Pro Arg 
225 230 235 240 

Ser Asp Leu Ala Phe Arg Trp His Phe He Lys Arg His Thr Val Pro 
245 250 255 
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Met Ser Pro Asn Ser Asp Glu Trp Val Ser Ala Asp Leu Ser Glu Arg 
260 265 270 

Lys Leu Arg Asp Ala Gin Leu Lys Arg Arg Asn Thr Glu Asp Asp lie 
275 280 285 

Pro Cys Phe Ser His Thr Asn Gly Gin Pro Cys Val lie Thr Ala Asn 
290 295 300 

Ser Ala Ser Cys Thr Gly Gly His lie Thr Gly Ser Met Met Asn Leu 
305 310 . 315 320 

Val Thr Asn Asn Ser lie Glu Asp Ser Asp Met Asp Ser Glu Asp Glu 
325 330 335 

He He Thr Leu Cys Thr Ser Ser Arg Lys Arg Asn Lys Pro Arg Trp 
340 345 350 

Glu Met Glu Glu Glu He Leu Gin Leu Glu Ala Pro Pro Lys Phe His 
355 360 365 

Thr Gin He Asp Tyr Val His Cys Leu Val Pro Asp Leu Leu Gin He 
370 375 380 

Ser Asn Asn Pro Cys Tyr Trp Gly Val Met Asp Lys Tyr Ala Ala Glu 
385 390 395 400 

Ala Leu Leu Glu Gly Lys Pro Glu Gly Thr Phe Leu Leu Arg Asp Ser 
405 410 415 

Ala Gin Glu Asp Tyr Leu Phe Ser Val Ser Phe Arg Arg Tyr Ser Arg 
420 425 430 

Ser Leu His Ala Arg He Glu Gin Trp Asn His Asn Phe Ser Phe Asp 
435 440 445 

Ala His Asp Pro Cys Val Phe His Ser Pro Asp He Thr Gly Leu Leu 
450 455 460 

Glu His Tyr Lys Asp Pro Ser Ala Cys Met Phe Phe Glu Pro Leu Leu 
465 470 475 480 

Ser Thr Pro Leu He Arg Thr Phe Pro Phe Ser Leu Gin His He Cys 
485 490 495 
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Arg Thr Val lie Cys Asn Cys Thr Thr Tyr Asp Gly lie Asp Ala Leu 
500 505 510 

Pro lie Pro Ser Pro Met Lys Leu Tyr Leu Lys Glu Tyr His Tyr Lys 
515 520 525 

Ser Lys Val Arg Leu Leu Arg He Asp Val Pro Glu Gin Gin 
530 535 540 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4999 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

CCCTCTGGGC AAGCCGCCCC CCCCCCACCC ATCTACCACA CACACACACA CACACACACA 60 

CACACATTCA GACCTTGGGG CAAAAACAAA GCAAAATAAC AACAACAAAA ACACTGCCTG 120 

TGGAAAGTCC TTACTTCAGG AAGGTTGGCA GATGAGGAGC AAGGGAACAT TTTATCAGGA 180 

CTGCCACAAA GGAGTCTTTT TTTTTAATGG TTTTTCAAGA CAGGGTTTCT CTGTATAGCC 240 

CTGGCTGTCC TGGAGCTCAC TTTGTAGACC AGGCTGGCCT CGAACTCAGA AATTCGCCTG 300 

CCTCTGCCTC CTGAGTGCTG GGATTAAAGG CGTGCAGCAC CATGTCCAAC TGGCATTTTC 360 

TCAATTAAGG TTCGTTCCTT TCAGATAACT CTAGGTTCTG GGTCAAGCTG ACACAAGGCT 420 

ACACAGCACA GTTTGTATGC CACATTCAGT TCAGAAGACA CCCAACCTCC CTGGAACTGG 480 

AACTTATGCA CATTTGTGAG CTTCCACTTG GGAGTGGGAA CCTGAACTGG GTCCTCTGCA 540 

AGAGCAGCCG TGCTCTTAAC TGCTGAGCCA TTTCAGCAGC CTCACATCAG AATTAAGTTA 600 
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GAAATTAGCCG GGTATGAATC ATACCCTTAG AATCCTAGCA TCTGAAAGCA GAGCTAAGAG 
660 

AAACAGGGAT TCAAGACCAG CTCTTGGCTA CAGAGCCCGT CCTGTCCTAG GATGGGCTAC 720 

AAGAGACTAT TTCAAAGCCA TCCAAACAAC AATAACTACA ACAACAACAA GGTTAAAATT 780 

AGGCTGGGCA CAGGGTACAC ACCTTTAATG CCAACACTCA GGAGGCAGAG GCAGGCTGAT 840 

CAGTGTGAGT TTGAGTTCAA CGTGGTCTAC ATAGGGAGTT CTAGGCCAGC AGAGGTTACA 900 

GTCTCTCTCT CTCTCTCTCT CTCTCTCTCT CTCTCACACA CACACACACA CACACACACA 960 

CACAO AC AC A CACACACGGT GGCATTATGG GATTTTTTTG GGATAAGGTT TCTCTGTCTA 1020 

GCCCTGGCAT AGATTCACTC TGTAGACTAG GCTAGCCTTG AACTCAGAGA TCCGCCTGCC 1080 

TCTGCCTCCC AAGTGCTGGG ATTATAGGTG TTGCACCACC ACTGCCCAGC CACTTTGGGA 1140 

TTTTTGAACT GTTATCAAGA GGCTTTCGAG GAGGTCAAAC TTCAACAGCA ACCTCTCCAT 1200 

GATAATGTAG CTAATGATCA AACGACACTC AAAACTTAAC CCTTAAAGCA CACATCCACC 1260 

AGACAGCGTG CCCACTCGTA GTTCCATTAC TCAGGAGGCT GAAGCAGGAG GATGAAGGAC 1320 

TAAGGCTTCA GCAACCTAGG GAGCCGCAGG GGACAGTAGT CTCAATCCCT ACATTCTCCT 1380 

GAACACAGGA GCAGGAGTTC AGGAAGGGTG TCAAGGCCGC TTACTGATCT TAGGGCCTCA 1440 

GGAATGACTA GCTCAGGCAG AGAGAGCAAA GGTCTCCAGT GGAGAAGTCT ACACACACAC 1500 

ACACACACAC ACACACACAC ACACACACAC AGAATCCAAG GCGATGACGT CATCAAAGGG 1560 

TTAATTCTAG TCTGGGATGG GGGGGAGGGT GGGGCACGCA GCTGTCAGGT GGCTTTGGAA 1620 

AAATAAACTG CTGAAGAGTC TGACGCCAGG GAGTCCTGGG AGGGACAAGA GGTTACCCAC 1680 

TCAAAGAGTG TGCTCCACAA AGCATGCGCG CTTGTCCACG TCTGGAGTCG TCACTTATTT 1740 

TTTGCCTGGA TTCTTTGTAG CCGGTGGGTT CTCAAGGCGG TAAGTGGTGT GGCCGCCGTG 1800 

GTCTGGGAGG TGACGATAGG GTTAATCGTC CACAGAGCCC AGGGGCGGAG CGCGGGCGGG 1860 

CGTCCGCAGC CCCGCTGGAG CCGGAAGCAG TGGCTGGTCA GGGGCGCTTC TAGCCTTCCC 1920 
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TATCTGTACT TCCACAGAGG TCTCTGCGAG CTAGGGGGAC AGTGAGGTGC GGGGTAGGGG 1980 

CCCGGCGTTA GAGCCAGCAA GGGGACGGTT CACGGTAAGG TCTGAGGGAG AGAGAGCTCC 2 040 

TGAGAAACTT GGGGGGCGCG ACACAGATAG GGTGAAAGCA GAGTGATAGA CCTGGGATGG 2100 

TTAGGGGACC AAGGGAAGAC CAGGCTGGTT GGCATACACC GGTGAACGGA TGGGAGTCCT 2160 

AGGGAAAGAT GATGCGCCTA ACAGTCCTTT CTGTCTCCAC ACCACTCCAG GGGACGATCC 2220 

GGAGCTCAAC TTTCAAAAGC GAGACGCCCC AGCAAGCCTG TTTTGAGAAG TTCTTCAGCG 22 80 

GCTCTCCTCA TGGGCCAGAC GGCCCTGGCA AGGGGCAGCA GCAGCACCCC TACCTCGCAG 2340 

GCTCTGTACT CGGACTTCTC TCCTCCCGAG GGCTTGGAGG AGCTCCTGTC TGCTCCCCCT 2400 

CCTGACCTGG TTGCCCAACG GCACCACGGC TGGAACCCCA AGGATTGCTC CGAGAACATC 2460 

GATGTCAAGG AAGGGGGTCT GTGCTTTGAG CGGCGCCCTG TGGCCCAGAG CACTGATGGA 2520 

GTCCGGGGGA AACGGGGCTA TTCGAGAGGT CTGCACGCCT GGGAGATCAG CTGGCCCCTG 2580 

GAGCAAAGGG GCACACACGC CGTGGTGGGC GTGGCCACCG CCCTCGCCCC GCTGCAGGCT 2640 

GACCACTATG CGGCGCTTTT GGGCAGCAAC AGCGAGTCCT GGGGCTGGGA TATTGGGCGG 2700 

GGAAAATTGT ATCATCAGAG TAAGGGCCTC GAGGCCCCCC AGTATCCAGC TGGACCTCAG 2760 

GGTGAGCAGC TAGTGGTGCC AGAGAGACTG CTGGTGGTTC TGGACATGGA GGAGGGGACT 2820 

CTTGGCTACT CTATTGGGGG CACGTACCTG GGACCAGCCT TCCGTGGACT GAAGGGGAGG 2880 

ACCCTCTATC CCTCTGTAAG TGCTGTTTGG GGCCAGTGCC AGGTCCGCAT CCGCTACATG 2940 

GGCGAAAGAA GAGGTGAGAT ACGGACTAGG TGTGGGGAGA TCACTACTCT TGGCAATGGT 3000 

TTGGGCTGGA AACTCATGGT TGGAGCACAG GAAGTAGGCT TCTTGTCACT TTGGCCTGTC 3 060 

ACTTAGATGG CCTTGGATCT AGCTTCACTC CCAATCCCTA TTGGATGTGA TGCACAAATT 3120 

CAGAGCCTTT GGGTCTCCCT CAGCTGAGGT GGCGGTGGAA ATGGAGGAAG AAGGAAGGGT 3180 

GCCTGAGCAG GATCTCAAGT TCAAGGATGC CTGGAGTTGC TTACTTACCT TGTCTTCCTT 3240 

CTCTCTCCGC AGTGGAGGAA CCACAATCCC TTCTGCACCT GAGCCGCCTG TGTGTGCGCC 3300 
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ATGCTCTGGG GGACACCCGG CTGGGTCAAA TATCCACTCT GCCTTTGCCC CCTGCCATGA 3360 

AGCGCTATCT GCTCTACAAA TGACCCAGTA GTACAGGGTG TGCTGGCACC CTACCGTGGG 3420 

GACAGGTGGA GAGGCACCCG CTGGCCTAGA CAACTTTAAA AAGCTGGTGA AGCTGGGGGG 3480 

GGGGGGCTGG ACCCCTTCAC CTCCCCTTCT CACAGGAGCA AGACATATAG AAATGATATT 3 540 

AAACACCATG GCAGCCTGGG ACAAAGAGGT TTTTGAAGTA AAAAATGAGA TGTATTGTCA 3 600 

CAACCTGTTT CATTATTGTT TTTTGTTTTG TTTTACACTC CCCCACCCCA GGCTAGAGCC 3660 

CCATCACTGT CTTAAGGAAT TATGACAACC CACAAAGCTC AGGCCCAGGT GTTTATTTCC 3720 

CTTACATGTA GGATGGTTCA CAAACACAAT ACAGGGGCTT TGGCACCGTG GGGGAGGGGA 3780 

CTATCCCAGG CCTCTTAGGG TCTCATGTAT ACCGAATTCA GACCCGAAAG CTCTGAATTT 3 840 

CTGCATCAGA CATCCAGTAG AACTTGGGAG TGAAGCTAGA GCCAAGGCCA TCTAAGTGAC 3900 

AGGCCAAAGT GACACGAAGC CCACTTCCTG TGCTCCAACG ATGAGTTTCC AGCCCAAACC 3960 

AATGGAAGGT GATTTCACTT GTCAGGGCCC AAAGGGACAG TCAGTTCTAC TCCCTCCCCT 4020 

CACTAGGAGC CACCTTGGTG ACAGTTGATT CTACCCACTG TAAGTGGTAA AGGGATTGGC 4080 

CTGGTCCCAA CCATAATAGG GCGGTGGAAA CGGCTCAGGA GGGTACAGCG TGGATTAGGC 4140 

CACAAGATGG GGCAGATGAT GTCATCAGAA GCATGTGACC GGTGGGAGCA GTTACTAAAC 4200 

TTCTGGGCAA CCTAGTCCAT GCTATGCAGG CAGGTAGAGG GATGGGCAGT GCTCATTGTT 4260 

TGGCATTGAT GATGTCCACA AATTCAGGCT TGAGAGATGC GCCACCCACA AGGAAGCCGT 4320 

CCACGTCAGG CTGGCTTGCC AGCTCTTTGC AGGTTGCTCC AGTCACAGAA CCTGTACCAG 4380 

GAACAAGAAG ACAGTTTGGT CAGGTCTATG ATCAGAACAC TTAAGCCCCA CCTCTCTGTG 4440 

CAAGGCAGCC TCAGTCTGTC TTAGCCCATT TCCGTCTTAG CTAGAGCCT^ AGCCACTCAC 4500 

CTCCATAAAT GATCCGGGTG CTCTGAGCCA CCCCATCATT GACATTGGAT TTCAGCCATC 4560 

CCCGGAGCTT CTCGTGTACT TCCTGTGCCT AGAAGGAGGA GGCAGAGCTA CTAAGTAAGC 4620 

TCCTTCCTAT CTATCATTCA AGGAGTAAAA ACCACTGGTT CTCACATAGA GTTGAGTTTC 4680 
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CAGAAAAGCC CCGGGACCAG AGAGTGGCAA GGCTCCAATC CCACCAGGCT TGGAATGAAC 4740 

ATTTTTGGCA AAGTCACTCT CCTTGGTGAG TTTGGGGGCC CTCTGTCTCT AAAGGGGCTT 4800 

GGATGGGCTC CATAGCTGTG TGAGTCTGTT AAAGCCGGAC AGGCTGAGGA GCTCTGGGTA 4860 

GTTACCTGCT GAGGGGTTGC CGTCTTGCCA GTCCCAATGG CCCACACAGG TTCATAGGCC 4920 

AGGACCACCT TGCTCCAGTC TTTCACATTA TCTGTGGGGC AGAGAGGAGA GTGAGTAGGA 4980 

AGGAGCTGAC CCGCCAAGC 4999 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 264 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 

Met Gly Gin Thr Ala Leu Ala Arg Gly Ser Ser Ser Thr Pro Thr Ser 



1 



5 



10 



15 



Gin 



Ala Leu Tyr Ser Asp Phe Ser Pro Pro Glu Gly Leu Glu Glu Leu 
20 25 30 



Leu 



Ser Ala Pro Pro Pro Asp Leu Val Ala Gin Arg His His Gly Trp 
35 40 45 



Asn 



Pro Lys Asp Cys Ser Glu Asn He Asp Val Lys Glu Gly Gly Leu 
50 55 60 



Cys 
65 



Phe Glu Arg Arg Pro Val Ala Gin Ser Thr Asp Gly Val Arg Gly 
70 75 80 



Lys 



Arg Gly Tyr Ser Arg Gly Leu His Ala Trp Glu He Ser Trp Pro 
85 90 95 



Leu 



Glu Gin Arg Gly Thr His Ala Val Val Gly Val Ala Thr Ala Leu 
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100 105 110 

Ala Pro Leu Gin Ala Asp His Tyr Ala Ala Leu Leu Gly Ser Asn Ser 
115 120 125 

Glu Ser Trp Gly Trp Asp lie Gly Arg Gly Lys Leu Tyr His Gin Ser 
130 135 140 

Lys Gly Leu Glu Ala Pro Gin Tyr Pro Ala Gly Pro Gin Gly Glu Gin 
145 150 155 160 

Leu Val Val Pro Glu Arg Leu Leu Val Val Leu Asp Met Glu Glu Gly 
165 170 175 

Thr Leu Gly Tyr Ser lie Gly Gly Thr Tyr Leu Gly Pro Ala Phe Arg 
180 185 190 

Gly Leu Lys Gly Arg Thr Leu Tyr Pro Ser Val Ser Ala Val Trp Gly 
195 200 205 

Gin Cys Gin Val Arg lie Arg Tyr Met Gly Glu Arg Arg Val Glu Glu 
210 215 220 

Pro Gin Ser Leu Leu His Leu Ser Arg Leu Cys Val Arg His Ala Leu 
225 230 235 240 

Gly Asp Thr Arg Leu Gly Gin lie Ser Thr Leu Pro Leu Pro Pro Ala 
245 250 255 

Met Lys Arg Tyr Leu Leu Tyr Lys 
260 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5615 base pairs 
{B) TYPE: nucleic acid 

(C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

GTACTTTCTT TATATCTCCA TAATTTTATT TACTATTACT ACATGATACA TTATTTTATA 60 

AAAGTCTTTG TAACCTCCTT AAGGATTCAC TGCTTAATCT CCAGTGCTTA GCACAAATCA 120 

TTAAATGCGA ACCAGAAACT CTTCCAAATG TGTTACATCT ATAACCTCAT TGGATTCTCA 180 

CTACCAACCC CATGCAATAG ATACTAATGT GATCTCTGTC TTACAGAGGA AGAAACAGGC 240 

ACAGGGAGGT TCAGTAATTT GCCCAAGGTC ATACACACAC TGGCCTTCAG GTATTCATGC 3 00 

CCGGGGAGTC TGGTCCCACA GCTGGCATGT TTGCCATTAT ATTATATTGC CTCCTTATAG 3 60 

TGTCGGCACT CATTAAGCAC ATTGACAGCT ATGCTTGGTG AGTGACTACT ATGTACCCAG 420 

CTCTGTGCTA CATGCTTTAC CTGGATTATT TCAACTGCAC AACAACCCTG TGAGGTAACT 480 

ACCATCATTG CTCCTATTTT ACATAACAGA AAACTACAGA AATCTGGGGC TGGGCGTAGT 540 

GGCTCATGCC TGAAATCCCA GCACTTTGGG AGACCCTGTC TCTAAAAAAA ATTTTTTTTT 600 

GGCCGGACGT GGTGGCTCAC ACCTGTAATC TCAGCACTTT GGGAGGCTAA GGCAGGCAGA 660 

TCACAAGGTC AGGAGTTCTA GACCAGCCTG GCCAACATGG CAAAACCCTG TGTCTACTAA 720 

AAATACAAAA AATAGCTAGG CGTGGTGGCA GGTGCCTGTA ATCCCAGCTA CTCAGGAGGC 780 

TGAGGCAGGA GAATCCCCTG AACCTGGGAG ATGGAGGTTA CAGAGAGCCG AGATCGTGCC 840 

GCTGCACTCC AGCCTGGGCA ACAAGAGCAA GACTCTGTCT CGAAAAAAAT AAAAATAAAA 900 

ATAAAAATAT TTTTTTAAAA ATTAGCTGGG TGTGGTAGCA CATGCCTGTA GTCCCAGCTA 960 

CTTGGGAGGC TGAGGTAGGA GGATCACTTG AGCCCAGGAG GTCAAGGCTG CAGTGGGCTG 1020 

TGATGGCGCC ACTGCACTCT AGCCTTGGTG ACAGCAAGAC CCTGTCTCAA AAAAAAAAAA 1080 

AAGAGAAATC GGGCAACTTC CCCAAGATCG CGCAGTTAAC TAGTGGCATA GCTTCACTCA 1140 

AACTCGAAGT CTTAATCAGG ACACTCTACC AAATGAGATC AACGGCTCAG TAATGGATTG 1200 

GCATCCAGTA TGAAGACTGG ACCAGCAGGG AGAACTATGA TGCGTACAGC CTAGAGCCTG 1260 

AAGCAGATTT CACAGCCTCA GAGGTGGCAC AGGCTGACTC ACAACCCGGG GCAGAAAGGG 1320 
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ACCAGCCCAG AAACAGTGAC CCAGAATCAC AGGGAAGTAG AAATGGGATT CGGCACAATG 1380 

AAGCCCCTCC TTGACCCCAT GCTCCTTACC CTCAGGGGCG CAGGAGTTAG TCGCTCAGGC 1440 

GGCTCAAAGG TCTTGACGGT GGAGAACACC ATCCCCAGGG ATTCCCGACG CGGTGATGCC 1500 

ATCAAAGCGT TAATTCTGAG ATGGGCCTGC CCGGGTGCGG ACTCTGCCGC AGCAAGAGAA 1560 

GGGTTAACTG CCCCGGGCCT TCGCCGTGGG GGCGGGGCCT CGGGGAGGGT CACAGCCCGG 1620 

GACTGAGACC CGAGGTTAAC CGCCCGGGGT GGGCTCCACG GGGGCGGGGC ATGCTCTCCG 1680 

CGGCTGCTGC CGGTATAGAG CGGTAACTGC CCAGGAGGGG GCGGGGCCCC ACAGGGGCGT 1740 

GGCCTCGGAG CTGCACGGCC GTGGGCGGCG ATGAGAGGGT TAAGCCCCAG AGGGCCCTGG 1800 

AGGGGCGGGG CCGCGGGACG GGCTCGGCCC AAGGGAGGAG CTGGGGGCGG AAGCGGCCGG 1860 

CGGTCTGCGC CCTGCGCGCC TCGGCTTCTT TCCGCCCGGC TCCTTCAGAG GCCCGGCGAC 1920 

CTCCAGGGCT GGGAAGTCAA CCGAGGTTCG GGGGCAGCGG CGAGGGCTCC GGGCGAGTAA 1980 

GGGGGATGGT CCATGCTGAG GCCCAAATGG GGCGAACTCG CGAGAGTCTC TGGCGACCTG 2040 

GATCAGATGG GGCGAGGGCA GATGAAGGGC CCAGGAGCTT TGGGGCAGCG AGGAGGGAGG 2100 

AGCGGGCCCG TTGGCAAACT TGGGTGAAAG GATGGGGTAC CTGGGTGACG AGCCCCCGCC 2160 

AGGATTCTGC TCTTCACGCC CCTTTTCTCC CAGCTCCCTT CCAGGTCAAT CCAAACTGGA 2220 

GCTCAACTTT CAGAAGAGAA AGACGCCCCA GCAAGCCTCT TTCGGGGAGT CCTCTAGCTC 2280 

CTCACCTCCA TGGGCCAGAC AGCTCTGGCA GGGGGCAGCA GCAGCACCCC CACGCCACAG 2340 

GCCCTGTACC CTGACCTCTC CTGTCCCGAG GGCTTGGAAG AGCTGCTGTC TGCACCCCCT 2400 

CCTGACCTGG GGGCCCAGCG GCGCCACGGT TGGAACCCCA AAGACTGTTC AGAGAACATC 2460 

GAGGTCAAGG AAGGAGGGTT GTACTTTGAG CGGCGGCCCG TGGCCCAGAG CACTGATGGG 252 0 

GCCCGGGGTA AGAGGGGCTA TTCAAGGGGC CTGCACGCCT GGGAGATCAG CTGGCCCCTA 2580 

GAGCAGAGGG GCACGCATGC CGTGGTGGGC GTGGCCACGG CCCTCGCCCC GCTGCAGACT 2640 

GACCACTACG CGGCGCTGCT GGGCAGCAAC AGCGAGTCGT GGGGCTGGGA CATCGGGCGG 2700 
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GGGAAGCTGT ACCATCAGAG CAAGGGGCCC GGAGCCCCCC AGTATCCAGC GGGAACTCAG 2760 

GGTGAGCAGC TGGAGGTGCC AGAGAGACTG CTGGTGGTTC TGGACATGGA GGAGGGAACT 2820 

CTGGGCTACG CTATTGGGGG CACCTACCTG GGGCCAGCAT TCCGCGGACT GAAGGGCAGG 2880 

ACCCTCTATC CGGCAGTAAG CGCTGTCTGG GGCCAGTGCC AGGTCCGCAT CCGCTACCTG 2940 

GGCGAAAGGA GAGGTGAGGC CTGGGGCAGA CGTGGGGAGA ACTTTCTGTC CCTGGTGGCA 3000 

GTGGTTTGGG ATGGAAACTC TTCTGACAAG AGCAGAGGGG ATGGACCTTC ATCCAGCCTG 3060 

CCTCAACCTC TGTTCAGTGC TGGGAAAGGC TAGGGGTCTT CACAGCTGTT ATTTAATTTA 3120 

ACCCAACAGC AATAGAGGTG AAACAGGCTT GAGAAAGCAA CTTTCTCAAG TTCTCTTGGC 3180 

CAGTAAATGG TGAACCTTCA GAATGGAGGG AGGAACTGCA GGGATGAGAG AATTCAGGAG 3240 

ATATCAACCC CTGAGCAAGA GGTGCAAAGC GTTAGGTACT GGGTTTGATG TACAGGTCCA 33 00 

AAAGAAGGAT GGGCAGAGCC AGGTACCCAG GCTGTATACC GGATTCCCTG GGCTCTAACC 3360 

TGTCTCTGTG CCACATACCT ACTTCCTTCC TCAGCCACAC CTCTGGATGG AGACACTGGG 3420 

GCCCTGGGCA CCAGGGAGGA GAGCAGTGGA GGAGGCAGGG CCTTAGGGTG GGGCAGCAGG 3480 

GGAGGAGCCT CCCCAGGAAC TGACTGGGTC CAGGGCTTGG AGCTGCTCTC TGCAGTTGTG 3540 

TGGGCTGTAG AGTGGAGGGC CATCCCTCCT CACCTCAGCC CCAGCTCCCA AGCCTCTGGA 3 600 

GTCAAAGCCT GGGCCAGCTC CACCACTGTC AGAGCCACCT TGGCCTGTTG TTTAGAGGGC 3 660 

CTTAGCCAGC TCTTCACCCC CAGCTCTGAC TAGGGATGTG TGAAATCTTA TCTGGGAGGC 3720 

AGAACTTCCG GGTATCTCAA ATTCCCCTTT CAGCCAGGTG GGCACACTCG AAGCAGGAAA 3780 

GCAGAAAGGC ATCTGAGTAG GACCCCGTAG TTTGAGGACA TCTGGCTGGT GGCTGCACCC 3840 

ATACTTACAT TCCCCTCCTT CTCTCTCCCA GCGGAGCCAC ACTCCCTTCT GCACCTGAGC 3900 

CGCCTGTGTG TGCGCCACAA CCTGGGGGAT ACCCGGCTCG GCCAGGTGTC TGCCCTGCCC 3960 

TTGCCCCCTG CCATGAAGCG CTACCTGCTC TACCAGTGAG CCCTGTGATA CCACAGACTG 402 0 

TGCTGAGGTC TTGCCACCAC CCCTCCCCTT GGGGAGGTGG GGAGGCACTG CTGGCCTAGA 4080 
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CCAGCTGCTG AAAGCTGGTG AGGCTGAGCC CCTACCCCAA CCCAAGCTCT GCGGAAATCA 4140 

ACAGCCCCAG AGCCACTTGG AGGGAGGAAG AAAGGGAGCC GGCGTTCAAG GCTATGACAG 4200 

TCTGCTACGC AAAACATTTT TTCAAGTAAA AATAGTAAGA GATGTTGTTA TAGAAACCTG 4260 

TTCTTGTTTT TTTTTTTTTC TTGCACAAAT GATCATTTAT ATAGCTGCCT CAAAAAGGAA 4320 

GATTATCTGG GCAAGTCCAG TGAAGGCAGA CAAACCACAA GACCTAGTGC CAGGTTTATT 4380 

CCCTCACATG GGTGGTTCAC ATACACAGCA CAGAGGCACG GGCACCATGG GAGAGGGCAG 4440 

CACTCCTGCC TTCTGAGGGG ATCTTGGCCT CACGGTGTAA GAAGGGAGAG GATGGTTTCT 4500 

CTTCTGCCCT CACTAGGGCC TAGGGAACCC AGGAGCAAAT CCCACCACGC CTTCCATCTC 4560 

TCAGCCAAGG AGAAGCCACC TTGGTGACGT TTAGTTCCAA CCATTATAGT AAGTGGAGAA 4620 

GGGATTGGCC TGGTCCCAAC CATTACAGGG TGAAGATATA AACAGTAAAG GAAGATACAG 4680 

TTTGGATGAG GCCACAGGAA GGAGCAGATG ACACCATCAG AAGCATATGC AGGGAAAGGG 4740 

CAGTTACTGG GCTTCTGGGC TGCTTAGTCC CTGGCTTGGC AGGAAGGGTA GGGAAGATGG 4800 

ATGGGGCTCA TTGTTTGGCA TTGATGATGT CCACGAATTC GGGCTTGAGG GAAGCACCAC 4860 

CCACAAGGAA GCCATCCACA TCAGGCTGGC TGGCCAGCTC CTTGCAGGTT GCCCCAGTCA 492 0 

CAGAGCCTGG GAAGGGAGCA GAACAAGGGC TTGGTCAAGA ATGGGATGAG TCTGCCCCAT 4980 

CCCCACCTCC ATGTCCGAGG GCTCAGTCTA GTCCTCAGCC CACTCCACCT CAGCCGGGAA 5040 

CCAAAGCCAC TCACCTCCAT AAATGATACG GGTGCTCTGA GCCACCGCAT CAGAGACGTT 5100 

GGACTTCAGC CATCCTCGGA GCTTCTCGTG TACTTCCTGG GCCTAGAACA AGAAGCTGGC 5160 

CTAAGTAAGA CCTTTTCTGC CTCTCTAAGA GGAAAAATCA CTGGCACCAG TGGACACTTA 5220 

GTGTGGTTTC TGACTGAGTC AGAGTACCAG GGCTCTGATC CAAGCCAGGC CCTGGACTGG 5280 

ATGCCCTTGG ACAAGTCACT GTCTCTGGGT TCAAGGTCTC TGTGTCTTTG AAATAAGGGG 5340 

TTGCCCCATG TGGGCTGTGT CTGTCCAAAC CTATTGAGGC AGGCTGGGAT GAGGGCAGGG 5400 

CTCCTGGGCC CGGTTACCTG TTGGGGTGTT GCAGTCTTGC CAGTACCAAT GGCCCACACA 5460 

SUBSTITUTE SHEET (RULE 26) 



wo 98/20023 PCT/AU97/00729 

-180- 

GGCTCATAGG CCAGGACGAC CTTGCTCCAG TCCTTCACGT TATCTGCAGG GCAGAGATAC 5520 
AGATGGAGGG AAGGGTGAAC AAGAAAGAGC TCTCCAGCCA GGTTCTCCGG AGTACGAAGA 5580 
ACGGTGGCCT ACTGCCCCCT AGTGGACATT GGGGG 5615 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 263 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 

Met Gly Gin Thr Ala Leu Ala Gly Gly Ser Ser Ser Thr Pro Thr Pro 
15 10 15 

Gin Ala Leu Tyr Pro Asp Leu Ser Cys Pro Glu Gly Leu Glu Glu Leu 
20 25 30 

Leu Ser Ala Pro Pro Pro Asp Leu Gly Ala Gin Arg Arg His Gly Trp 
35 40 45 

Asn Pro Lys Asp Cys Ser Glu Asn lie Glu Val Lys Glu Gly Gly Leu 
50 55 60 

Tyr Phe Glu Arg Arg Pro Val Ala Gin Ser Thr Asp Gly Ala Arg Gly 
65 70 75 80 

Lys Arg Gly Tyr Ser Arg Gly Leu His Ala Trp Glu lie Ser Trp Pro 
85 90 95 

Leu Glu Gin Arg Gly Thr His Ala Val Val Gly Val Ala Thr Ala Leu 
100 105 110 

Ala Pro Leu Gin Thr Asp His Tyr Ala Ala Leu Leu Gly Ser Asn Ser 
115 120 125 
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Glu Ser Trp Gly Trp Asp lie Gly Arg Gly Lys Leu Tyr His Gin Ser 
130 135 140 

Lys Gly Pro Gly Ala Pro Gin Tyr Pro Ala Gly Thr Gin Gly Glu Gin 
145 150 155 160 

Leu Glu Val Pro Glu Arg Leu Leu Val Val Leu Asp Met Glu Glu Gly 
165 170 175 

Thr Leu Gly Tyr Ala lie Gly Gly Thr Tyr Leu Gly Pro Ala Phe Arg 
180 185 190 

Gly Leu Lys Gly Arg Thr Leu Tyr Pro Ala Val Ser Ala Val Trp Gly 
195 200 205 

Gin Cys Gin Val Arg lie Arg Tyr Leu Gly Glu Arg Arg Ala Glu Pro 
210 215 220 

His Ser Leu Leu His Leu Ser Arg Leu Cys Val Arg His Asn Leu Gly 
225 230 235 240 

Asp Thr Arg Leu Gly Gin Val Ser Ala Leu Pro Leu Pro Pro Ala Met 
245 250 255 

Lys Arg Tyr Leu Leu Tyr Gin 
260 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 2 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
AGCTAGATCT GGACCCTACA ATGGCAGC 28 

(2) INFORMATION FOR SEQ ID NO: 50: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
AGCTAGATCT GCCATCCTAC TCGAGGGGCC AGCTGG 3 6 
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CLAIMS: 



1 . A nucleic acid molecule comprising a sequence of nucleotides encoding or complementary 
to a sequence encoding a protein or a derivative, homologue, analogue or mimetic thereof or a 
nucleotide sequence capable of hybridizing thereto under low stringency conditions at 42^*0 
wherein said protein comprises a SOCS box in its C-terminal region. 

2. A nucleic acid molecule according to claim 1 wherein the protein further comprises a 
protein:molecule interacting region. 

3. A nucleic acid molecule according to claim 1 wherein the protein:molecule interacting 
region is located in a region N-terminal of the SOCS box. 

4. A nucleic acid molecule according to claim 2 or 3 wherein the protein:molecule 
interacting region is a protein:DNA binding region or a protein:protein binding region. 

5. A nucleic acid molecule according to claim 4 wherein the proteinrmolecule interacting 
region is one or more of an SH2 domain, WD-40 repeats or ankyrin repeats. 

6. A nucleic acid molecule according to any one of claims 1-5 wherein the SOCS box 
comprises the amino acid sequence: 




20 



wherein: 



X, is L, I, V, M, AorP; 

X2 is any amino acid residue; 

X3 is P,TorS; 

X4 is L, I, V, M, Aor P; 

X5 is any amino acid; 

Xg is any amino acid; 
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X7 is L, I, V, M, A, F, YorW; 

Xg is C, TorS; 

X9 is R, K or H; 

Xio is any amino acid; 

X|, is any amino acid; 

X,2 is L, I, V, M, A or P; 

Xi3 is any amino acid; 

X|4 is any amino acid; 

X,5 is any amino acid; 

X16 is L, I, V, M» A, P, G, C, T or S; 

[XJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X^is L, I, V, M, AorP; 
X,8 is any amino acid; 
X,9 is any amino acid; 
X20L, I. V, M,AorP; 
X2, is P; 

X22 is L, I, V, A,PorG; 
X23 is P or N; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V. M, A or P; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; and 

X28 is L, I. V, M, A or P. 

7. A nucleic acid molecule according to claim 6 wherein the protein modulates signal 
transduction. 
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8. A nucleic acid molecule according to claim 7 wherein the signal transduction is modulated 
by a cytokine or a hormone, a microbe or a microbial product, a parasite, an antigen or other 
effector molecule. 

9. A nucleic acid molecule according to claim 8 wherein the protein modulates cytokine- 
mediated signal transduction. 

10. A nucleic acid molecule according to claim 9 wherein the signal transduction is mediated 
by one or more of the cytokines EPO, TPO, G-CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, 
IL-6, LIF, IL-12, IFNy, TNFa, IL-1 and/or M-CSR 

11. A nucleic acid molecule according to claim 10 wherein the signal transduction is mediated 
by one or more of IL-6, LIF, OSM, IFN-y and/or thrombopoietin. 

12. A nucleic acid molecule according to claim 1 1 wherein the signal transduction is mediated 
by IL-6. 

13. A nucleic acid molecule according to claim 1 wherein the nucleotide sequence encodes 
an amino acid sequence substantially as set forth in SEQ ID NO. 4, SEQ ID NO. 6, SEQ ID NO. 
8, SEQ ID NO. 10, SEQ ID NO. 12, SEQ ID NO. 14, SEQ ID NO. 18, SEQ ID NO. 21, SEQ 
ID NO. 25, SEQ ID NO. 29, SEQ ID NO, 36, SEQ ID NO. 41, SEQ ID NO. 44, SEQ ID NO. 
46 or SEQ ID NO. 48 or an amino acid sequence having at least about 15% similarity to all or 
part of the listed sequences or a nucleotide sequence which hybridizes to the nucleic acid 
molecule under low stringency conditions at 42 °C. 

14. A nucleic acid molecule according to claim 1 wherein the nucleotide sequence is 
substantiaKy as set forth in SEQ ID NO. 3, SEQ ID NO. 5, SEQ ID NO. 7, SEQ ID NO. 9, SEQ 
ID NO. 1 1, SEQ ID NO. 13, SEQ ID NO. 15, SEQ ID NO. 16, SEQ ID NO. 17, SEQ ID NO. 
20, SEQ ID NO. 22, SEQ ID NO. 23, SEQ ID NO. 24, SEQ ID NO. 26, SEQ ID NO. 27, SEQ 
ID NO. 28, SEQ ID NO. 30, SEQ ID NO. 31, SEQ ID NO. 32, SEQ ID NO. 33, SEQ ID NO. 
34, SEQ ID NO. 35, SEQ ID NO. 37, SEQ ID NO. 38, SEQ ID NO. 39, SEQ ID NO. 40, SEQ 
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ID NO. 42, SEQ ID NO. 43, SEQ ID NO. 45 or SEQ ID NO. 47 or a nucleotide sequence 
having at least 15% similarity to all or a part of the listed sequences or a nucleotide sequence 
capable of hybridizing to the listed sequences under low stringency conditions at 42 ""C. 

15. A nucleic acid molecule conprising a sequence of nucleotides encoding or complementary 
to a sequence encoding a protein or a derivative, homologue, analogue or mimetic thereof or a 
nucleotide sequence capable of hybridizing thereto under low stringency conditions at 42°C 
wherein said protein exhibits the following characteristics: 

(i) comprises a SOCS box in its C-terminal region wherein said SOCS box comprises 
the amino acid sequence: 

X] X2 X3 X4 X5 Xg X7 Xg X9 Xio Xii Xj2 Xi3 XJ4X15 X,6 [XJn Xj7 X,8 X,9 Xjq 
^21 ^22 ^23 [^jln ^24 X26 X27X28 

wherein: X, is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P,TorS; 

X^is L, I, V, M,AorP; 

X5 is any amino acid; 

Xg is any amino acid; 

X7 is L, I, V, M, A, F, YorW; 

Xg is CTorS; 

Xg is R,Kor H; 

Xjo is any amino acid; 

Xj, is any amino acid; 

Xj2 is L, I, V, M, Aor P; 

X,3 is any amino acid; 

X,4 is any amino acid; 

X,5 is any amino acid; 

X,6 is L, I, V, M, A, P, G, C, T or S; 
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[XJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
Xnis L, I, V, M, AorP; 
X,g is any amino acid; 
Xi9 is any amino acid; 
X2oU I, V, M, AorP; 
X21 is P; 

X22 is L, I, V, M, A,PorG; 
X23 isPorN; 

PC^„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24is L. I, V, M, AorP; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; 

X28 is L, I, V, M, Aor P;and 



(ii) comprises at least one of an SH2 domain, WD-40 repeats and/or ankyrin repeats 
or other protein:molecule interacting domain in a region N-terminal of the SOCS box; 
and 

(iii) modulates signal transduction. 

16. An isolated protein or a derivative, homologue or mimetic thereof comprising a SOCS 
box in its C-terminal region. 

17. An isolated protein according to claim 16 wherein the protein further comprises a 
protein:molecule interacting region. 



18. An isolated protein according to claim 17 wherein the proteinrmolecule interacting region 
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is located in a region N-lerminal of the SOCS box. 

19. An isolated protein according to claim 16 or 17 wherein the proteinrmolecule interacting 
region is a protein:DNA binding region or a protein:protein binding region. 

20. An isolated protein according to claim 19 wherein the protein:molecule interacting region 
is one or more of an SH2 domain, WD-40 repeats or ankyrin repeats. 

21. An isolated protein according to any one of claims 16-20 wherein the SOCS box 
comprises the amino acid sequence: 

Xi X2 X3 X4 X5 X, X7 Xg X9 Xjo Xh X,2 Xi3 Xh X,5 X,6 [Xi]„ X^ X,^ X,^ X^o 

X21 X22 X23 [Xjln X24 X25 X26 X27X2g 

wherein: Xj is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P,TorS; 

X4 is L, I, V, M, Aor P; 

X5 is any amino acid; 

Xg is any amino acid; 

X7 is L, I, V, M, A, F,YorW; 

Xg is CTorS; 

X9 is R, K or H; 

Xio is any amino acid; 

Xii is any amino acid; 

Xi2isL,I,V,M, AorP; 

X,3 is any amino acid; 

Xi4 is any amino acid; 

Xi5 is any amino acid; 

X,6 is U I. V, M, A, P, G, C, T or S; 

[XJ„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
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and wherein the sequence may comprise the same or different amino 

acids selected from any amino acid residue; 

Xpis L, I, V, M, AorP; 

Xj8 is any amino acid; 

X,9 is any amino acid; 

X20L, I, V, M, Aor P; 

X2, isP; 

X22 is L, I, V, M, A, P or G; 
X23 is P or N; 

[Xj]n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V, M, A or P; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; and 

X28 is L, I, V, M, A or P. 

22. An isolated protein according to claim 21 wherein the protein modulates signal 
transduction. 

23. An isolated protein according to claim 22 wherein the signal transduction is modulated 
by a cytokine or other endogenous molecule, a hormone, a microbe or a microbial product, a 
parasite, an antigen or other effector molecule. 

24. An isolated protein according to claim 23 wherein the protein modulates cytokine- 
mediated signal transduction. 

25. An isolated protein according to claim 24 wherein the signal transduction is mediated 
by one or more of the cytokines EPO, TPO, G-CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL-13, 
IL-6, LIF, IL-12, IFNy, TNFa, IL-1 and/or M-CSR 
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26. An isolated protein according to claim 25 wherein the signal transduction is mediated by 
one or more of IL-6, LIF, OSM, WN-y and/or thrombopoietin. 

27. An isolated protein according to claim 26 wherein the signal transduction is mediated by 
IL-6. 

28. An isolated protein according to claim 16 wherein said protein comprises an amino acid 
sequence substantially as set forth in SEQ ID NO. 4, SEQ ID NO. 6, SEQ ID NO. 8, SEQ ID 
NO. 10, SEQ ID NO. 12, SEQ ID NO. 14, SEQ ID NO. 18, SEQ ID NO. 21, SEQ ID NO. 25, 
SEQ ID NO. 29, SEQ ID NO. 36, SEQ ID NO. 41, SEQ ID NO. 44, SEQ ID NO. 46 or SEQ 
ID NO. 48 or an amino acid sequence having at least about 15% similarity to all or part of the 
listed sequences. 

29. An isolated protein according to claim 16 wherein the said protein is encoded by a 
nucleotide sequence substantially as set forth in SEQ ID NO. 3, SEQ ID NO. 5, SEQ ID NO. 7, 
SEQ ID NO. 9, SEQ ID NO. 1 1, SEQ ID NO. 13, SEQ ID NO. 15, SEQ ID NO. 16, SEQ ID 
NO. 17, SEQ ID NO. 20, SEQ ID NO. 22, SEQ ID NO. 23, SEQ ID NO. 24, SEQ ID NO. 26, 
SEQ ID NO. 27, SEQ ID NO. 28, SEQ ID NO. 30, SEQ ID NO. 31, SEQ ID NO. 32, SEQ ID 
NO. 33, SEQ ID NO. 34, SEQ ID NO. 35, SEQ ID NO. 37, SEQ ID NO. 38, SEQ ID NO. 39, 
SEQ ID NO. 40, SEQ ID NO. 42, SEQ ID NO. 43, SEQ ID NO. 45 or SEQ ID NO. 47 or a 
nucleotide sequence having at least 15% similarity to all or a part of the listed sequences or a 
nucleotide sequence capable of hybridizing to the listed sequences under low stringency 
conditions at 42°C. 

30. An isolated protein or a derivative, homologue, analogue or mimetic thereof having the 
following characteristics: 

(i) conprises a SOCS box in its C-terminal region wherein said SOCS box comprises 
the amino acid sequence: 

Xj X2 X3 X4 X5 Xg X7 Xg X9 X|Q X,, X,2 Xi3 X]4Xi5 X16 [XJn X,7 Xjg X|9 Xjq 

X21 X22 X23 [Xj]„ X24 X25 X26 X27X28 
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wherein: Xi is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P, T or S; 

X4 is L, I, V, M, A or P; 

X5 is any amino acid; 

X^ is any amino acid; 

Xy is L, I. V, M, A, F, YorW; 

Xg is CTorS; 

Xg is R, K or H; 

XjQ is any amino acid; 

X,, is any amino acid; 

X12 is L,I, V, M, A or P; 

Xi3 is any amino acid; 

Xi4 is any amino acid; 

Xi5 is any amino acid; 

X,6 is L, I, V, M, A, P, G, C, T or S; 

[XJn is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence X^ may comprise the same or different amino 
acids selected from any amino acid residue; 
X^is L. I, V, M, Aor P; 
Xjg is any amino acid; 
Xi9 is any amino acid; 
X20L, I, V, M, AorP; 
X2, isP; 

X22isL,I, V,M, A,PorG; 
X23 is P or N; 

[X|]n is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
X24 is L, I, V, M, A or P; 
X25 is any amino acid; 
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is any amino acid; 
X27 is Y or F; 

is L, I, V, M, A or P; and 

(ii) comprises at least one of an SH2 domain, WD-40 repeats and/or ankyrin repeats 
or other protein:molecule interacting domain in a region N-terminal of the SOCS box; 
and 

(iii) modulates signal transduction. 

31. A method of modulating levels of a SOCS protein in a cell said method comprising 
contacting a cell containing a SOCS gene with an effective amount of a modulator of SOCS gene 
expression or SOCS protein activity for a time and under conditions sufficient to modulate levels 
of said SOCS protein. 

32. A method of modulating signal transduction in a cell containing a SOCS gene comprising 
contacting said cell with an effective amount of a modulator of SOCS gene expression or SOCS 
protein activity for a time sufficient to modulate signal transduction. 

33. A method of influencing interaction between cells wherein at least one cell carries a SOCS 
gene, said method comprising contacting the cell carrying the SOCS gene with an effective 
amount of a modulator of SOCS gene expression or SOCS protein activity for a time sufficient 
to modulate signal transduction. 

34. A method according to any one of claims 3 1-33 wherein signal transduction is mediated 
by a cytokine, a hormone, a microbe or a microbial product, a parasite, an antigen or other 
effector molecule. 

35. A method according to claim 34 wherein the cytokine is one or more of EPO, TPO, G- 
CSF, GM-CSF, IL-3, IL-2, IL-4, IL-7, IL.13, IL-6, LIF, IL-12, IFNy, TNFa, IL-1 and/or 
CSF. 
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36. A method according to claim 35 wherein the cytokine is one or more of IL-6, LIF, OSM, 
IFN-y and/or thrombopoietin. 

37. A method according to claim 36 wherein the cytokine is IL-6. 

38. A method according to any one of claims 31-37 wherein the SOCS gene encodes a 
protein having a SOCS box comprising the amino acid sequence: 

X, X2 X3 X4 X5 Xg X7 Xg Xg X,o Xji Xj2 Xi3 X14X15 X,6 [XJn Xi7 X,8 Xi9 X20 

X21 X22 X23 [Xj]jj X24 X25 X26 X27X28 

wherein: X^ is L, I, V, M, A or P; 

X2 is any amino acid residue; 

X3 is P, T or S; 

X4isL,I, V,M, AorP; 

X5 is any amino acid; 

Xg is any amino acid; 

X7 is L, I, V, M. A, F, YorW; 

Xg is CTorS; 

X9 is R, K or H; 

X,o is any amino acid; 

X,, is any amino acid; 

Xj2 is I, V, M, A or P; 

Xi3 is any amino acid; 

X,4 is any amino acid; 

Xi5 is any amino acid; 

X,6 is L, I, V, M, A, P, G, C, T or S; 

[XJ„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 
and wherein the sequence Xj may comprise the same or different amino 
acids selected from any amino acid residue; 
Xi^is L. I, V, M, Aor P; 
Xjg is any amino acid; 
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X,9 is any amino acid; 
X20 L, I, V, M, AorP; 
X2, is P; 

X22 is L, I, V, M, A, P or G; 
X23 is P or N; 

[Xj]„ is a sequence of n amino acids wherein n is from 1 to 50 amino acids 

and wherein the sequence Xj may comprise the same or different amino 

acids selected from any amino acid residue; 

X24 is L, I, V, M, A or P; 

X25 is any amino acid; 

X26 is any amino acid; 

X27 is Y or F; and 

X28 is L, I, V, M. A or P. 



39. A method according to claim 38 wherein the SOCS gene comprises a nucleotide 
sequence selected from SEQ ID NO. 3, SEQ ID NO. 5, SEQ ID NO. 7, SEQ ID NO. 9, SEQ 
ID NO. II, SEQ ID NO. 13, SEQ ID NO. 15, SEQ ID NO. 16, SEQ ID NO. 17, SEQ ID NO. 
20, SEQ ID NO. 22, SEQ ID NO. 23, SEQ ID NO. 24, SEQ ID NO. 26, SEQ ID NO. 27, SEQ 
ID NO. 28, SEQ ID NO. 30, SEQ ID NO. 31, SEQ ID NO. 32, SEQ ID NO. 33, SEQ ID NO. 
34, SEQ ID NO. 35, SEQ ID NO. 37, SEQ ID NO. 38, SEQ ID NO. 39, SEQ ID NO. 40, SEQ 
ID NO. 42, SEQ ID NO. 43, SEQ ID NO. 45 or SEQ ID NO. 47. 



40. A method according to claim 38 wherein the SOCS gene encodes a protein comprising 
an amino acid sequence substantially as set forth in SEQ ID NO. 4, SEQ ID NO. 6, SEQ ID NO. 
8, SEQ ID NO. 10, SEQ ID NO. 12, SEQ ID NO. 14, SEQ ID NO. 18, SEQ ID NO. 21, SEQ 
ID NO. 25, SEQ ID NO. 29, SEQ ID NO. 36, SEQ ID NO. 41, SEQ ID NO. 44, SEQ ID NO. 
46 or SEQ ID NO. 48. 
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_2^59 cgaggctcaagctccgggcggattctgcgtgccgctctcg 

-12 0 ctccttggggtctgttggccggcctgtgccacccggacgcccggctcactgcctctgtct 

-60 cccccatcagcgcagccccggacgctatggcccacccctccagctggcccctcgagtagg 

1 MVARNQVAADNAISPAAEPR 

1 ATGGTAGCACGCAACCAGGTGGCAGCCGACAATGCGATCTCCCCGGCAGCAGAGCCCCGA 

21 RRSEPSSSSSSSSPAAPVRP 

61 CGGCGGTCAGAGCCCTCCTCGTCCTCGTCTTCGTCCTCGCCAGCGGCCCCCGTGCGTCCC 

41 RPCPAVPAPAPGDTHFRTFR 

121 CGGCCCTGCCCGGCGGTCCC AGCCCCAGCCCCTGGCG ACACTCACTTCCGCACCTTCCGC 

61 SHSDYRRITRTSALLDACGF 

181 TCCC ACTCCGATT ACCGGCGCATCACGCGGACC AGCGCGCTCCTGGACGCCTGCGGCTTC 

31 YWGPLSVHGAHERLRAEPVG 

241 TATTGGGGACCCCTGAGCGTGCACGGGGCGCACGAGCGGCTGCGTGCCGAGCCCGTGGGC 

101 TFLVRDSRQRNCFFALSVKM 

301 ACCTTCTTGGTGCGCGACAGTCGTCAACGGAACTGCTTCTTCGCGCTCAGCGTGAAGATG 

121 ASGPTSIRVHFQAGRFHLDG 

361 GCTTCGGGCCCCACGAGCATCCGCGTGCACTTCCAGGCCGGCCGCTTCCACTTGGACGGC 

141 SRETFDCLFELLEHYVAAPR 

421 AGCCGCGAGACCTTCGACTGCCTTTTCGAGCTGCTGGAGCAcTACGTGGCGGCGCCGCGC 

161 RMLGAPLRQRRVRPLQELCR 

481 CGCATGTTGGGGGCCCcGCTGCGCCAGCGCCGCGTGCGGCCGCTGCAGGAGCTGTGTCGC 

181 QRIVAAVGRENLARIPLNPV 

541 CAGCGCATCGTGGCCGCCGTGGGTCGCGAGAACCTGGCGCGCATCCcTCTTAACCCGGTA 

201 LRDYLSSFPFQI* 

601 CTCCGTGACTACCTGAGTTCCTTCCCCTTCCAGATCtgaccggctgccgctgtgccgcag 

661 cattaagtgggggcgccttattatttcttattattaattattattatttttctggaacca 
721 cgtgggagccctccccgcctgggtcggagggagtggttgtggagggtgagatgcctccca 
7 81 cttctggctggagacctcatcccacctctcaggggtgggggtgctcccctcctggtgctc 
841 cctccgggtcccccctggttgtagcagcttgtgtctggggccaggacctgaattccactc 
901 ctacctctccatgtttacatattcccagtatctttgcacaaaccaggggtcggggagggt 
961 ctctggcttcatttttctgctgtgcagaatatcctattttatatttttacagccagttta 
1021 ggt^ata^ctttattatgaaagtttttttttaaaagaaaaaaaaaaaaaaaaaa 
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FIG 9(1) 
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FIG 49 
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I I I 

Blot: 0JAK2 Blot: aFLAG 



SUBSTITUTE SHEET (RULE 26) 



wo 98/20023 



PCT/AU97/00729 



123/126 



A 



JAK2 



IgH- 

as 

SOC$3 
SOCSl 
SOCS2 




DNA: JAK2+ , SI S2 S3 CIS -| 



IP: aJAK2 
Blot: cxFLAG 



L 



SI S2 S3 CIS 



Blot. oFLAG 




DNA: JAK2 + 



SI S2 S3 CIS 



L_ — 

Blot: ctPY 



FIG 50 



SUBSTITUTE SHEET (RULE 26) 



wo 98/20023 



PCT/AU97/00729 



124/126 



(VJ O NO OS) ^ 

GO g J- CSJ m CM 

^ Osl CsJ Csl CM 

i^CM CM CM _ 

^ Lj (o o) E 

o m m -O ql ft) 

00 c>o X 00 CD 



LTI 
ID 




rn g CO ^ >o g 

vo rr Ti-w \n — ^ 



io ^ — 



< x: 00 z Lu ~ 



^ S E 

LLJ (/) 



SUBSTITUTE SHEET (RULE 26) 



wo 98/20023 PCT/AU97/00729 



125/126 




SUBSTITUTE SHEET (RULE 26) 



wo 98/20023 



PCT/AU97/00729 




SUBSTITUTE SHEET (RULE 25) 



r 



INTERNATIONAL SEARCH REPORT 


International Application No. 




PCT/AU 97/00729 



A. CLASSIFICATION OF SUBJECT MATTER 



Int Ci^: C07K 2/00 

According to International Patent Classification (IPC) or to both national classification and IPC 
B. FIELDS SEARCHED 

Minimum documentation searched (classification system followed by classification symbols) 



Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 
STN Peptide sub sequence search 

STN [LIVMAP]. [PTS]..[LIVMAP]..[LIVMAF YW] [CTS] [RKH]..[LIVMAP] {3}[LIVMAPGC 
TS].{1/'}[LIVMAP]..[LIVMAPJ P [LIVMAPG] [PN]. {ij'} [LIVMAP].. [YF] [LIVMAP] 



C. 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



X,P 



X 



WO 96/39427 (Trustees of Dartmouth College) 
12 December 1996 
The whole document 

Yeast vol 12 No 15 issued 1996 Delaveau, Th et al. 

"Analysis of a 23 kb region on the left arm of yeast chromosome IV 

pages 1587-1592 

Science vol 270 No 5234 issued 1995 labeit. Set al "Titins: giant proteins in charge 
of muscle ultrastrucmre and elasticity" 
pages 293-6 



1-40 



1-40 



1-40 



□ 



Further documents are listed in the 
continuation of Box C 



See patent family annex 



"O" 



"P" 



Special categories of cited documents: 

document defining the general state of the art which is 
not considered to be of particular relevance 
earlier document but published on or after the 
international filing date 

document which may throw doubts on priority claim(s) 
or which is cited to establish the publication date of 
another citation or other special reason (as specified) 
document referring to an oral disclosure, use, 
exhibition or other means 
document published prior to the international filing 



"X" 



later document published after the international filing date or 
priority date and not in conflict with the application but cited to 
understand the principle or dieory underlying the invention 
document of particular relevance; the claimed invention cannot 
be considered novel or carmot be considered to involve an 
inventive step when the document is taken alone 
document of particular relevance; the claimed invention cannot 
be considered to involve an inventive step when the document is 
combined with one or more other such documents, such 
combination being obvious to a person skilled in the art 
document member of the same patent family 



Date of the actual completion of the international search 
20 November 1997 


Date of mailing of the international search report 

12 DEC 1997 


Name and mailing address of the ISA/AU 

AUSTRALIAN INDUSTRIAL PROPERTY ORGANISATION 

PO BOX 200 

WODEN ACT 2606 

AUSTRALIA FacsimUe No. : (02) 6285 3929 


Authorized officer 
K.F. PECK 

Telephone No.: (02) 6283 2263 



Form PCT/ISA/210 (second sheet) (July 1992) COPCHL 



INTERNATIONAL SEARCH REPORT 



international Application No. 
PCT/AU 97/00729 



r fronHnnfltion) DOCTIMENTS CONSmERED TO BE RELEVANT 


Categon* 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to 
claim No. 


X 


The EMBO Journal Vol 14 No 12 issued 1995 Yoshimura, A et al "A novel cytokine - 
inducable gene CIS encodes and SH2 - containing protein that binds to tyrosine - 
phosphorylated interleukin 3 and erythropoietin receptors" 
pages 2816-26 


1-40 


X 


AU. A, 27924/95 (Fliigge, U.l.) 17 August 1995 
The whole document, particularly pages 29-32 


1 Af\ 

1-4U 


X 


Biochemistrv vol 34 No 8 issued 1995 Weber, A et al "The 2-oxoglutarate/malate iranslocator 
of chloroplast envelope membranes;molecuiar cloning of a transporter containing a 12-helix 
motif and expression of the functional protein in yeast cells, 
pages 2621-7 


1-40 


X 


Journal of BacterioloEV Vol 176 No 24 issued 1994 Iwai, A et al "Molecular cloning and 
expression of an isomalto-dexiranase gene from Arthrobacter globiformis T6*' 
pages 7730-4. 


1-40 


X 


Nucleic Acids Research Vol 122 No 1 1 issued 1994 Althofif, S et al "Molecular evolution of 
SRP cycle components:functional implications" 
pages 1933-47 


1-40 


X 


Nature vol 368 No 6466 issued 1994 Wilson, R et al "2.2 Mb of contiguous nucleotide sequence 
from chromosome III of C elegans" 
pages 32-8. 


1-40 


X 


The EMBO Journal Vol 1 1 No 5 issued 1992 Labeit, S et al 'Towards a molecular 
understanding of Titin'' 
pages 1711-16 


1-40 


X 


Advances in Bioohvsics Vol 33 (Muscle Elastic Proteins) issued 1996 Kolmerer, B et al "A 
systematic search of the data bases for sequences homologous to titin/connectin" 
pages 3-11 




X 


Microbioloev Vol 142 no 8 issued 1996 Yoncyama, H "Protein C (OprC) of the outer 
membrane of Pseudomonas aeruginosa is a copper-regulated channel protein" 
pages 2137-2144. 


1-40 


X 


Journal of Bacteriolocv Vol 178 No 15 issued 1996 Limberger. Ret al. "Organisation, 
transcription and expression 'of the 5' region of the fla operon of Treponema phagedcnis and 
Treponema pallidum" 
pages 4628-4634. 


1-40 


X 


The Journal of cell bioloev Vol 133 No 6 issued 1996 Goodson, H. V et al "Synthetic lethality 
screen identifies a novel yest myosin I gene (MY05): myosin I protein are required for 
polarisation of the actin c>toskeleton" 
pages 1277-1291 


1-40 


X 


Genes and Develooment Vol 9 No 24 issued 1995 Herrscher, R F et al "The inununoglobulin 
heavy-chain matrix-associating regions are bound by Bright: a B cell-specific trans-activator 
that describes a new DNA-binding protein family" 
pages 3067-82 


1-40 



Fonn PCT/ISA/210 (continuation of second sheet) (July 1992) COPCHL 



INTERNATIONAL SEARCH REPORT 

Information on patent family members 



This Annex lists the known "A" publication level patent family members relating to the patent documents cited 
in the above-mentioned international search report. The Australian Patent Office is in no way liable for these 
particulars which are merely given for the purpose of information. 



International Application No. 
PCT/AU 97/00729 



Patent Document Cited in Search . ^ Patent Family Member 

Report ' 

AU A 27924/95 CA . 2192849 DE 4420782 EP 765393 

HY V 9603441 - WO 95/34654 



END OF ANNEX 



Forai PCT/ISA/210 (extra sheet) (July 1992) COPCHL 



